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Preface 



The third edition of this book continues to demonstrate how to apply probability theory 
to gain insight into real, everyday statistical problems and situations. As in the previous 
editions, carefully developed coverage of probability motivates probabilistic models of real 
phenomena and the statistical procedures that follow. This approach ultimately results 
in an intuitive understanding of statistical procedures and strategies most often used by 
practicing engineers and scientists. 

This book has been written for an introductory course in statistics, or in probability 
and statistics, for students in engineering, computer science, mathematics, statistics, and 
the natural sciences. As such it assumes knowledge of elementary calculus. 

ORGANIZATION AND COVERAGE 

Chapter 1 presents a brief introduction to statistics, presenting its two branches of descrip- 
tive and inferential statistics, and a short history of the subject and some of the people 
whose early work provided a foundation for work done today. 

The subject matter of descriptive statistics is then considered in Chapter 2. Graphs and 
tables that describe a data set are presented in this chapter, as are quantities that are used 
to summarize certain of the key properties of the data set. 

To be able to draw conclusions from data, it is necessary to have an understanding 
of the data's origination. For instance, it is often assumed that the data constitute a 
"random sample" from some population. To understand exactly what this means and 
what its consequences are for relating properties of the sample data to properties of the 
entire population, it is necessary to have some understanding of probability, and that 
is the subject of Chapter 3. This chapter introduces the idea of a probability experi- 
ment, explains the concept of the probability of an event, and presents the axioms of 
probability. 

Our study of probability is continued in Chapter 4, which deals with the important 
concepts of random variables and expectation, and in Chapter 5, which considers some 
special types of random variables that often occur in applications. Such random variables 
as the binomial, Poisson, hypergeometric, normal, uniform, gamma, chi-square, t, and 
F are presented. 

In Chapter 6, we study the probability distribution of such sampling statistics 
as the sample mean and the sample variance. We show how to use a remarkable 
theoretical result of probability, known as the central limit theorem, to approximate 
the probability distribution of the sample mean. In addition, we present the joint 
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probability distribution of the sample mean and the sample variance in the impor- 
tant special case in which the underlying data come from a normally distributed 
population. 

Chapter 7 shows how to use data to estimate parameters of interest. For instance, a 
scientist might be interested in determining the proportion of Midwestern lakes that are 
afflicted by acid rain. Two types of estimators are studied. The first of these estimates 
the quantity of interest with a single number (for instance, it might estimate that 47 
percent of Midwestern lakes suffer from acid rain), whereas the second provides an esti- 
mate in the form of an interval of values (for instance, it might estimate that between 
45 and 49 percent of lakes suffer from acid rain). These latter estimators also tell us 
the "level of confidence" we can have in their validity. Thus, for instance, whereas we 
can be pretty certain that the exact percentage of afflicted lakes is not 47, it might very 
well be that we can be, say, 95 percent confident that the actual percentage is between 
45 and 49. 

Chapter 8 introduces the important topic of statistical hypothesis testing, which is 
concerned with using data to test the plausibility of a specified hypothesis. For instance, 
such a test might reject the hypothesis that fewer than 44 percent of Midwestern lakes 
are afflicted by acid rain. The concept of the p-value, which measures the degree of 
plausibility of the hypothesis after the data have been observed, is introduced. A variety 
of hypothesis tests concerning the parameters of both one and two normal populations 
are considered. Hypothesis tests concerning Bernoulli and Poisson parameters are also 
presented. 

Chapter 9 deals with the important topic of regression. Both simple linear 
regression — including such subtopics as regression to the mean, residual analysis, and 
weighted least squares — and multiple linear regression are considered. 

Chapter 10 introduces the analysis of variance. Both one-way and two-way (with and 
without the possibility of interaction) problems are considered. 

Chapter 1 1 is concerned with goodness of fit tests, which can be used to test whether a 
proposed model is consistent with data. In it we present the classical chi-square goodness 
of fit test and apply it to test for independence in contingency tables. The final section 
of this chapter introduces the Kolmogorov-Smirnov procedure for testing whether data 
come from a specified continuous probability distribution. 

Chapter 12 deals with nonparametric hypothesis tests, which can be used when one 
is unable to suppose that the underlying distribution has some specified parametric form 
(such as normal). 

Chapter 13 considers the subject matter of quality control, a key statistical technique 
in manufacturing and production processes. A variety of control charts, including not only 
the Shewhart control charts but also more sophisticated ones based on moving averages 
and cumulative sums, are considered. 

Chapter 14 deals with problems related to life testing. In this chapter, the exponential, 
rather than the normal, distribution, plays the key role. 
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NEW TO THIS EDITION 

New exercises and real data examples have been added throughout, including: 

• The One-sided Chebyshev Inequality for Data (Section 2.4) 

• The Logistics Distribution and Logistic Regression (Sections 5.4 and 9.1 1) 

• Estimation and Testing in proofreader problems (Examples 7.2B and 8.7g) 

• Product Form Estimates of Life Distributions (Section 7.2.1) 

• Observational Studies (Example 8.6e) 

About the CD 

Packaged along with the text is a PC disk that can be used to solve most of the statistical 
problems in the text. For instance, the disk computes the/>-values for most of the hypothesis 
tests, including those related to the analysis of variance and to regression. It can also be 
used to obtain probabilities for most of the common distributions. (For those students 
without access to a personal computer, tables that can be used to solve all of the problems 
in the text are provided.) 

One program on the disk illustrates the central limit theorem. It considers random 
variables that take on one of the values 0, 1,2,3,4, and allows the user to enter the 
probabilities for these values along with an integer n. The program then plots the probability 
mass function of the sum of n independent random variables having this distribution. By 
increasing n, one can "see" the mass function converge to the shape of a normal density 
function. 
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INTRODUCTION TO STATISTICS 



I.I INTRODUCTION 

It has become accepted in today's world that in order to learn about something, you must 
first collect data. Statistics is the art of learning from data. It is concerned with the collection 
of data, its subsequent description, and its analysis, which often leads to the drawing of 
conclusions. 



1.2 DATA COLLECTION AND DESCRIPTIVE STATISTICS 

Sometimes a statistical analysis begins with a given set of data: For instance, the government 
regularly collects and publicizes data concerning yearly precipitation totals, earthquake 
occurrences, the unemployment rate, the gross domestic product, and the rate of inflation. 
Statistics can be used to describe, summarize, and analyze these data. 

In other situations, data are not yet available; in such cases statistical theory can be used to 
design an appropriate experiment to generate data. The experiment chosen should depend 
on the use that one wants to make of the data. For instance, suppose that an instruc- 
tor is interested in determining which of two different methods for teaching computer 
programming to beginners is most effective. To study this question, the instructor might 
divide the students into two groups, and use a different teaching method for each group. 
At the end of the class the students can be tested and the scores of the members of the 
different groups compared. If the data, consisting of the test scores of members of each 
group, are significantly higher in one of the groups, then it might seem reasonable to 
suppose that the teaching method used for that group is superior. 

It is important to note, however, that in order to be able to draw a valid conclusion 
from the data, it is essential that the students were divided into groups in such a manner 
that neither group was more likely to have the students with greater natural aptitude for 
programming. For instance, the instructor should not have let the male class members be 
one group and the females the other. For if so, then even if the women scored significantly 
higher than the men, it would not be clear whether this was due to the method used 
to teach them, or to the fact that women may be inherently better than men at learning 
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programming skills. The accepted way of avoiding this pitfall is to divide the class members 
into the two groups "at random." This term means that the division is done in such 
a manner that all possible choices of the members of a group are equally likely. 

At the end of the experiment, the data should be described. For instance, the scores 
of the two groups should be presented. In addition, summary measures such as the aver- 
age score of members of each of the groups should be presented. This part of statistics, 
concerned with the description and summarization of data, is called descriptive statistics. 

1.3 INFERENTIAL STATISTICS AND 
PROBABILITY MODELS 

After the preceding experiment is completed and the data are described and summarized, 
we hope to be able to draw a conclusion about which teaching method is superior. This 
part of statistics, concerned with the drawing of conclusions, is called inferential statistics. 

To be able to draw a conclusion from the data, we must take into account the possibility 
of chance. For instance, suppose that the average score of members of the first group is 
quite a bit higher than that of the second. Can we conclude that this increase is due to the 
teaching method used? Or is it possible that the teaching method was not responsible for 
the increased scores but rather that the higher scores of the first group were just a chance 
occurrence? For instance, the fact that a coin comes up heads 7 times in 10 flips does 
not necessarily mean that the coin is more likely to come up heads than tails in future 
flips. Indeed, it could be a perfectly ordinary coin that, by chance, just happened to land 
heads 7 times out of the total of 10 flips. (On the other hand, if the coin had landed 
heads 47 times out of 50 flips, then we would be quite certain that it was not an ordinary 
coin.) 

To be able to draw logical conclusions from data, we usually make some assumptions 
about the chances (or probabilities) of obtaining the different data values. The totality of 
these assumptions is referred to as a probability model lor the data. 

Sometimes the nature of the data suggests the form of the probability model that is 
assumed. For instance, suppose that an engineer wants to find out what proportion of 
computer chips, produced by a new method, will be defective. The engineer might select 
a group of these chips, with the resulting data being the number of defective chips in this 
group. Provided that the chips selected were "randomly" chosen, it is reasonable to suppose 
that each one of them is defective with probability^, where/* is the unknown proportion 
of all the chips produced by the new method that will be defective. The resulting data can 
then be used to make inferences about p. 

In other situations, the appropriate probability model for a given data set will not be 
readily apparent. However, careful description and presentation of the data sometimes 
enable us to infer a reasonable model, which we can then try to verify with the use of 
additional data. 

Because the basis of statistical inference is the formulation of a probability model to 
describe the data, an understanding of statistical inference requires some knowledge of 
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the theory of probability. In other words, statistical inference starts with the assumption 
that important aspects of the phenomenon under study can be described in terms of 
probabilities; it then draws conclusions by using data to make inferences about these 
probabilities. 



1.4 POPULATIONS AND SAMPLES 

In statistics, we are interested in obtaining information about a total collection of elements, 
which we will refer to as the population. The population is often too large for us to examine 
each of its members. For instance, we might have all the residents of a given state, or all the 
television sets produced in the last year by a particular manufacturer, or all the households 
in a given community. In such cases, we try to learn about the population by choosing 
and then examining a subgroup of its elements. This subgroup of a population is called 
a sample. 

If the sample is to be informative about the total population, it must be, in some sense, 
representative of that population. For instance, suppose that we are interested in learning 
about the age distribution of people residing in a given city, and we obtain the ages of the 
first 100 people to enter the town library. If the average age of these 100 people is 46.2 
years, are we justified in concluding that this is approximately the average age of the entire 
population? Probably not, for we could certainly argue that the sample chosen in this case 
is probably not representative of the total population because usually more young students 
and senior citizens use the library than do working- age citizens. 

In certain situations, such as the library illustration, we are presented with a sample and 
must then decide whether this sample is reasonably representative of the entire population. 
In practice, a given sample generally cannot be assumed to be representative of a population 
unless that sample has been chosen in a random manner. This is because any specific 
nonrandom rule for selecting a sample often results in one that is inherently biased toward 
some data values as opposed to others. 

Thus, although it may seem paradoxical, we are most likely to obtain a representative 
sample by choosing its members in a totally random fashion without any prior consid- 
erations of the elements that will be chosen. In other words, we need not attempt to 
deliberately choose the sample so that it contains, for instance, the same gender percentage 
and the same percentage of people in each profession as found in the general population. 
Rather, we should just leave it up to "chance" to obtain roughly the correct percentages. 
Once a random sample is chosen, we can use statistical inference to draw conclusions about 
the entire population by studying the elements of the sample. 

1.5 A BRIEF HISTORY OF STATISTICS 

A systematic collection of data on the population and the economy was begun in the Italian 
city states of Venice and Florence during the Renaissance. The term statistics, derived from 
the word state, was used to refer to a collection of facts of interest to the state. The idea of 
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collecting data spread from Italy to the other countries of Western Europe. Indeed, by the 
first half of the 16th century it was common for European governments to require parishes 
to register births, marriages, and deaths. Because of poor public health conditions this last 
statistic was of particular interest. 

The high mortality rate in Europe before the 19th century was due mainly to epidemic 
diseases, wars, and famines. Among epidemics, the worst were the plagues. Starting with 
the Black Plague in 1348, plagues recurred frequently for nearly 400 years. In 1562, as a 
way to alert the King's court to consider moving to the countryside, the City of London 
began to publish weekly bills of mortality. Initially these mortality bills listed the places 
of death and whether a death had resulted from plague. Beginning in 1625 the bills were 
expanded to include all causes of death. 

In 1662 the English tradesman John Graunt published a book entitled Natural and 
Political Observations Made upon the Bills of Mortality. Table 1.1, which notes the total 
number of deaths in England and the number due to the plague for five different plague 
years, is taken from this book. 



TABLE I.I 


Total Deaths in England 




Year 


Burials 


Plague Deaths 


1592 


25,886 


11,503 


1593 


17,844 


10,662 


1603 


37,294 


30,561 


1625 


51,758 


35,417 


1636 


23,359 


10,400 



Source: John Graunt, Observations Made upon the Bills of Mortality. 
3rded. London: John Martyn and James Allestry (1st ed. 1662). 

Graunt used London bills of mortality to estimate the city's population. For instance, 
to estimate the population of London in 1660, Graunt surveyed households in certain 
London parishes (or neighborhoods) and discovered that, on average, there were approxi- 
mately 3 deaths for every 88 people. Dividing by 3 shows that, on average, there was 
roughly 1 death for every 88/3 people. Because the London bills cited 13,200 deaths in 
London for that year, Graunt estimated the London population to be about 

13,200 x 88/3 = 387,200 

Graunt used this estimate to project a figure for all England. In his book he noted that 
these figures would be of interest to the rulers of the country, as indicators of both the 
number of men who could be drafted into an army and the number who could be taxed. 
Graunt also used the London bills of mortality — and some intelligent guesswork as to 
what diseases killed whom and at what age — to infer ages at death. (Recall that the bills 
of mortality listed only causes and places at death, not the ages of those dying.) Graunt 
then used this information to compute tables giving the proportion of the population that 
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TABLE 1.2 John Graunt's Mortality Table 



Age at Death Number of Deaths per 100 Births 

0-6 36 

6-16 24 

16-26 15 

26-36 9 

36-46 6 

46-56 4 

56-66 3 

66-76 2 

76 and greater 1 

Note: The categories go up to but do not include the right-hand value. For instance, 
0—6 means all ages from up through 5- 

dies at various ages. Table 1.2 is one of Graunt's mortality tables. It states, for instance, 
that of 100 births, 36 people will die before reaching age 6, 24 will die between the age of 
6 and 15, and so on. 

Graunt's estimates of the ages at which people were dying were of great interest to those 
in the business of selling annuities. Annuities are the opposite of life insurance in that one 
pays in a lump sum as an investment and then receives regular payments for as long as one 
lives. 

Graunt's work on mortality tables inspired further work by Edmund Halley in 1693. 
Halley, the discoverer of the comet bearing his name (and also the man who was most 
responsible, by both his encouragement and his financial support, for the publication of 
Isaac Newton's famous Principia Matbematica), used tables of mortality to compute the 
odds that a person of any age would live to any other particular age. Halley was influential 
in convincing the insurers of the time that an annual life insurance premium should depend 
on the age of the person being insured. 

Following Graunt and Halley, the collection of data steadily increased throughout 
the remainder of the 17th and on into the 18th century. For instance, the city of Paris 
began collecting bills of mortality in 1667; and by 1730 it had become common practice 
throughout Europe to record ages at death. 

The term statistics, which was used until the 18th century as a shorthand for the 
descriptive science of states, became in the 19th century increasingly identified with 
numbers. By the 1830s the term was almost universally regarded in Britain and France 
as being synonymous with the "numerical science" of society. This change in meaning 
was caused by the large availability of census records and other tabulations that began to 
be systematically collected and published by the governments of Western Europe and the 
United States beginning around 1800. 

Throughout the 1 9th century, although probability theory had been developed by such 
mathematicians as Jacob Bernoulli, Karl Friedrich Gauss, and Pierre-Simon Laplace, its 
use in studying statistical findings was almost nonexistent, because most social statisticians 
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at the time were content to let the data speak for themselves. In particular, statisticians 
of that time were not interested in drawing inferences about individuals, but rather were 
concerned with the society as a whole. Thus, they were not concerned with sampling but 
rather tried to obtain censuses of the entire population. As a result, probabilistic inference 
from samples to a population was almost unknown in 19th century social statistics. 

It was not until the late 1 800s that statistics became concerned with inferring conclusions 
from numerical data. The movement began with Francis Galton's work on analyzing 
hereditary genius through the uses of what we would now call regression and correlation 
analysis (see Chapter 9), and obtained much of its impetus from the work of Karl Pearson. 
Pearson, who developed the chi-square goodness of fit tests (see Chapter 1 1), was the first 
director of the Galton Laboratory, endowed by Francis Galton in 1904. There Pearson 
originated a research program aimed at developing new methods of using statistics in 
inference. His laboratory invited advanced students from science and industry to learn 
statistical methods that could then be applied in their fields. One of his earliest visiting 
researchers was W. S. Gosset, a chemist by training, who showed his devotion to Pearson 
by publishing his own works under the name "Student." (A famous story has it that Gosset 
was afraid to publish under his own name for fear that his employers, the Guinness brewery, 
would be unhappy to discover that one of its chemists was doing research in statistics.) 
Gosset is famous for his development of the £-test (see Chapter 8). 

Two of the most important areas of applied statistics in the early 20th century were 
population biology and agriculture. This was due to the interest of Pearson and others at 
his laboratory and also to the remarkable accomplishments of the English scientist Ronald 
A. Fisher. The theory of inference developed by these pioneers, including among others 



TABLE 1.3 The Changing Definition of Statistics 

Statistics has then for its object that of presenting a faithful representation of a state at a determined 
epoch. (Quetelet, 1849) 

Statistics are the only tools by which an opening can be cut through the formidable thicket of 
difficulties that bars the path of those who pursue the Science of man. (Galton, 1889) 

Statistics may be regarded (i) as the study of populations, (ii) as the study of variation, and (iii) as the 
study of methods of the reduction of data. (Fisher, 1925) 

Statistics is a scientific discipline concerned with collection, analysis, and interpretation of data obtained 
from observation or experiment. The subject has a coherent structure based on the theory of 
Probability and includes many different procedures which contribute to research and development 
throughout the whole of Science and Technology. (E. Pearson, 1936) 

Statistics is the name for that science and art which deals with uncertain inferences — which uses 
numbers to find out something about nature and experience. (Weaver, 1952) 

Statistics has become known in the 20th century as the mathematical tool for analyzing experimental 
and observational data. (Porter, 1986) 

Statistics is the art of learning from data, (this book, 2004) 



Problems 



Karl Pearson's son Egon and the Polish born mathematical statistician Jerzy Neyman, 
was general enough to deal with a wide range of quantitative and practical problems. As 
a result, after the early years of the 20th century a rapidly increasing number of people 
in science, business, and government began to regard statistics as a tool that was able to 
provide quantitative solutions to scientific and practical problems (see Table 1.3). 

Nowadays the ideas of statistics are everywhere. Descriptive statistics are featured in 
every newspaper and magazine. Statistical inference has become indispensable to public 
health and medical research, to engineering and scientific studies, to marketing and quality 
control, to education, to accounting, to economics, to meteorological forecasting, to 
polling and surveys, to sports, to insurance, to gambling, and to all research that makes 
any claim to being scientific. Statistics has indeed become ingrained in our intellectual 
heritage. 



Problems 

An election will be held next week and, by polling a sample of the voting 
population, we are trying to predict whether the Republican or Democratic 
candidate will prevail. Which of the following methods of selection is likely to 
yield a representative sample? 

(a) Poll all people of voting age attending a college basketball game. 

(b) Poll all people of voting age leaving a fancy midtown restaurant. 

(c) Obtain a copy of the voter registration list, randomly choose 100 names, and 
question them. 

(d) Use the results of a television call-in poll, in which the station asked its listeners 
to call in and name their choice. 

(e) Choose names from the telephone directory and call these people. 

The approach used in Problem 1(e) led to a disastrous prediction in the 1936 
presidential election, in which Franklin Roosevelt defeated Alfred Landon by a 
landslide. A Landon victory had been predicted by the Literary Digest. The maga- 
zine based its prediction on the preferences of a sample of voters chosen from lists 
of automobile and telephone owners. 

(a) Why do you think the Literary Digest's prediction was so far off? 

(b) Has anything changed between 1936 and now that would make you believe 
that the approach used by the Literary Digest would work better today? 

A researcher is trying to discover the average age at death for people in the United 
States today. To obtain data, the obituary columns of the New York Times are read 
for 30 days, and the ages at death of people in the United States are noted. Do 
you think this approach will lead to a representative sample? 
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4. To determine the proportion of people in your town who are smokers, it has been 
decided to poll people at one of the following local spots: 

(a) the pool hall; 

(b) the bowling alley; 

(c) the shopping mall; 

(d) the library. 

Which of these potential polling places would most likely result in a reasonable 
approximation to the desired proportion? Why? 

5. A university plans on conducting a survey of its recent graduates to determine 
information on their yearly salaries. It randomly selected 200 recent graduates and 
sent them questionnaires dealing with their present jobs. Of these 200, however, 
only 86 were returned. Suppose that the average of the yearly salaries reported was 
$75,000. 

(a) Would the university be correct in thinking that $75,000 was a good approxi- 
mation to the average salary level of all of its graduates? Explain the reasoning 
behind your answer. 

(b) If your answer to part (a) is no, can you think of any set of conditions relat- 
ing to the group that returned questionnaires for which it would be a good 
approximation? 

6. An article reported that a survey of clothing worn by pedestrians killed at night in 
traffic accidents revealed that about 80 percent of the victims were wearing dark- 
colored clothing and 20 percent were wearing light-colored clothing. The conclu- 
sion drawn in the article was that it is safer to wear light-colored clothing at night. 

(a) Is this conclusion justified? Explain. 

(b) If your answer to part (a) is no, what other information would be needed 
before a final conclusion could be drawn? 

7. Critique Graunt's method for estimating the population of London. What 
implicit assumption is he making? 

8. The London bills of mortality listed 12,246 deaths in 1658. Supposing that a 
survey of London parishes showed that roughly 2 percent of the population died 
that year, use Graunt's method to estimate London's population in 1658. 

9. Suppose you were a seller of annuities in 1 662 when Graunt's book was published. 
Explain how you would make use of his data on the ages at which people were 
dying. 

10. Based on Graunt's mortality table: 

(a) What proportion of people survived to age 6? 

(b) What proportion survived to age 46? 

(c) What proportion died between the ages of 6 and 36? 




DESCRIPTIVE STATISTICS 



2.1 INTRODUCTION 

In this chapter we introduce the subject matter of descriptive statistics, and in doing 
so learn ways to describe and summarize a set of data. Section 2.2 deals with ways of 
describing a data set. Subsections 2.2.1 and 2.2.2 indicate how data that take on only 
a relatively few distinct values can be described by using frequency tables or graphs, whereas 
Subsection 2.2.3 deals with data whose set of values is grouped into different intervals. 
Section 2.3 discusses ways of summarizing data sets by use of statistics, which are numerical 
quantities whose values are determined by the data. Subsection 2.3.1 considers three 
statistics that are used to indicate the "center" of the data set: the sample mean, the sample 
median, and the sample mode. Subsection 2.3.2 introduces the sample variance and its 
square root, called the sample standard deviation. These statistics are used to indicate the 
spread of the values in the data set. Subsection 2.3.3 deals with sample percentiles, which 
are statistics that tell us, for instance, which data value is greater than 95 percent of all 
the data. In Section 2.4 we present Chebyshev's inequality for sample data. This famous 
inequality gives a lower bound to the proportion of the data that can differ from the 
sample mean by more than k times the sample standard deviation. Whereas Chebyshev's 
inequality holds for all data sets, we can in certain situations, which are discussed in 
Section 2.5, obtain more precise estimates of the proportion of the data that is within k 
sample standard deviations of the sample mean. In Section 2.5 we note that when a graph 
of the data follows a bell-shaped form the data set is said to be approximately normal, and 
more precise estimates are given by the so-called empirical rule. Section 2.6 is concerned 
with situations in which the data consist of paired values. A graphical technique, called 
the scatter diagram, for presenting such data is introduced, as is the sample correlation 
coefficient, a statistic that indicates the degree to which a large value of the first member 
of the pair tends to go along with a large value of the second. 

2.2 DESCRIBING DATA SETS 

The numerical findings of a study should be presented clearly, concisely, and in such 
a manner that an observer can quickly obtain a feel for the essential characteristics of 
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the data. Over the years it has been found that tables and graphs are particularly useful 
ways of presenting data, often revealing important features such as the range, the degree 
of concentration, and the symmetry of the data. In this section we present some common 
graphical and tabular ways for presenting data. 

2.2. 1 Frequency Tables and Graphs 

A data set having a relatively small number of distinct values can be conveniently presented 
in a frequency table. For instance, Table 2. 1 is a frequency table for a data set consisting of the 
starting yearly salaries (to the nearest thousand dollars) of 42 recently graduated students 
with B.S. degrees in electrical engineering. Table 2.1 tells us, among other things, that the 
lowest starting salary of $47,000 was received by four of the graduates, whereas the highest 
salary of $60,000 was received by a single student. The most common starting salary was 
$52,000, and was received by 10 of the students. 

TABLE 2.1 Starting Yearly Salaries 



Starting Salary 


Frequency 


47 


4 


48 


1 


49 


3 


50 


5 


51 


8 


52 


10 


53 





54 


5 


56 


2 


57 


3 


60 


1 



Data from a frequency table can be graphically represented by a line graph that plots the 
distinct data values on the horizontal axis and indicates their frequencies by the heights of 
vertical lines. A line graph of the data presented in Table 2.1 is shown in Figure 2.1. 

When the lines in a line graph are given added thickness, the graph is called a bar graph. 
Figure 2.2 presents a bar graph. 

Another type of graph used to represent a frequency table is the frequency polygon, which 
plots the frequencies of the different data values on the vertical axis, and then connects the 
plotted points with straight lines. Figure 2.3 presents a frequency polygon for the data of 
Table 2.1. 

2.2.2 Relative Frequency Tables and Graphs 

Consider a data set consisting of n values. If/ is the frequency of a particular value, then 
the ratio / In is called its relative frequency. That is, the relative frequency of a data value is 



2.2 Describing Data Sets 




Starting salary 



FIGURE 2.1 Starting salary data. 
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Starting salary 



FIGURE 2.2 Bar graph for starting salary data. 
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49 



50 



51 52 53 

Starting salary 



56 



57 



60 



FIGURE 2.3 Frequency polygon for starting salary data. 



the proportion of the data that have that value. The relative frequencies can be represented 
graphically by a relative frequency line or bar graph or by a relative frequency polygon. 
Indeed, these relative frequency graphs will look like the corresponding graphs of the 
absolute frequencies except that the labels on the vertical axis are now the old labels (that 
gave the frequencies) divided by the total number of data points. 

EXAMPLE 2.2a Table 2.2 is a relative frequency table for the data of Table 2.1. The rela- 
tive frequencies are obtained by dividing the corresponding frequencies of Table 2. 1 by 
42, the size of the data set. ■ 

A pie chart is often used to indicate relative frequencies when the data are not numerical 
in nature. A circle is constructed and then sliced into different sectors; one for each distinct 
type of data value. The relative frequency of a data value is indicated by the area of its sector, 
this area being equal to the total area of the circle multiplied by the relative frequency of 
the data value. 

EXAMPLE 2.2b The following data relate to the different types of cancers affecting the 200 
most recent patients to enroll at a clinic specializing in cancer. These data are represented 
in the pie chart presented in Figure 2.4. ■ 
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TABLE 2.2 



Starting Salary 


Frequency 


47 


4/42 =.0952 


48 


1/42 =.0238 


49 


3/42 


50 


5/42 


51 


8/42 


52 


10/42 


53 





54 


5/42 


56 


2/42 


57 


3/42 


60 


1/42 



Melanoma 
4.5% 



Bladder 
6% 



Lung 
21% 



Prostate 
27.5% 




Breast 
25% 



Colon 
16% 



FIGURE 2.4 
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Type of Cancer Number of New Cases Relative Frequency 

Lung 42 .21 

Breast 50 .25 

Colon 32 .16 

Prostate 55 .275 

Melanoma 9 .045 

Bladder 12 .06 



2.2.3 Grouped Data, Histograms, Ogives, and 
Stem and Leaf Plots 

As seen in Subsection 2.2.2, using a line or a bar graph to plot the frequencies of data values 
is often an effective way of portraying a data set. However, for some data sets the number 
of distinct values is too large to utilize this approach. Instead, in such cases, it is useful to 
divide the values into groupings, or class intervals, and then plot the number of data values 
falling in each class interval. The number of class intervals chosen should be a trade-off 
between (1) choosing too few classes at a cost of losing too much information about the 
actual data values in a class and (2) choosing too many classes, which will result in the 

TABLE 2.3 Life in Hours of 200 Incandescent Lamps 



Item Lifetimes 


1,067 


919 


1,196 


785 


1,126 


936 


918 


1,156 


920 


948 


855 


1,092 


1,162 


1,170 


929 


950 


905 


972 


1,035 


1,045 


1,157 


1,195 


1,195 


1,340 


1,122 


938 


970 


1,237 


956 


1,102 


1,022 


978 


832 


1,009 


1,157 


1,151 


1,009 


765 


958 


902 


923 


1,333 


811 


1,217 


1,085 


896 


958 


1,311 


1,037 


702 


521 


933 


928 


1,153 


946 


858 


1,071 


1,069 


830 


1,063 


930 


807 


954 


1,063 


1,002 


909 


1,077 


1,021 


1,062 


1,157 


999 


932 


1,035 


944 


1,049 


940 


1,122 


1,115 


833 


1,320 


901 


1,324 


818 


1,250 


1,203 


1,078 


890 


1,303 


1,011 


1,102 


996 


780 


900 


1,106 


704 


621 


854 


1,178 


1,138 


951 


1,187 


1,067 


1,118 


1,037 


958 


760 


1,101 


949 


992 


966 


824 


653 


980 


935 


878 


934 


910 


1,058 


730 


980 


844 


814 


1,103 


1,000 


788 


1,143 


935 


1,069 


1,170 


1,067 


1,037 


1,151 


863 


990 


1,035 


1,112 


931 


970 


932 


904 


1,026 


1,147 


883 


867 


990 


1,258 


1,192 


922 


1,150 


1,091 


1,039 


1,083 


1,040 


1,289 


699 


1,083 


880 


1,029 


658 


912 


1,023 


984 


856 


924 


801 


1,122 


1,292 


1,116 


880 


1,173 


1,134 


932 


938 


1,078 


1,180 


1,106 


1,184 


954 


824 


529 


998 


996 


1,133 


765 


775 


1,105 


1,081 


1,171 


705 


1,425 


610 


916 


1,001 


895 


709 


860 


1,110 


1,149 


972 


1,002 
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frequencies of each class being too small for a pattern to be discernible. Although 5 to 10 
class intervals are typical, the appropriate number is a subjective choice, and of course, you 
can try different numbers of class intervals to see which of the resulting charts appears to 
be most revealing about the data. It is common, although not essential, to choose class 
intervals of equal length. 

The endpoints of a class interval are called the class boundaries. We will adopt the 
left-end inclusion convention, which stipulates that a class interval contains its left-end but 
not its right-end boundary point. Thus, for instance, the class interval 20-30 contains 
all values that are both greater than or equal to 20 and less than 30. 

Table 2.3 presents the lifetimes of 200 incandescent lamps. A class frequency table for 
the data of Table 2.3 is presented in Table 2.4. The class intervals are of length 100, with 
the first one starting at 500. 

TABLE 2.4 A Class Frequency Table 



Class Interval 



500- 

600- 

700- 

800- 

900- 

1000- 

1100- 

1200- 

1300- 

1400- 



-600 

■700 

800 

900 

1000 

-1100 

1200 

1300 

1400 

1500 



Frequency 

(Number of Data Values in 

the Interval) 



2 

5 

12 

25 

58 

41 

43 

7 

6 

1 



Number of 

occurrences 

60 




5 6 7 8 9 10 11 12 13 14 15 
Life in units of 100 hours 



FIGURE 2.5 A frequency histogram. 
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J 500 



700 



900 1,100 

Lifetimes 



1,300 



1,500 



FIGURE 2.6 A cumulative frequency plot. 



A bar graph plot of class data, with the bars placed adjacent to each other, is called 
a histogram. The vertical axis of a histogram can represent either the class frequency or the 
relative class frequency; in the former case the graph is called a frequency histogram and 
in the latter a relative frequency histogram. Figure 2.5 presents a frequency histogram of the 
data in Table 2.4. 

We are sometimes interested in plotting a cumulative frequency (or cumulative relative 
frequency) graph. A point on the horizontal axis of such a graph represents a possible 
data value; its corresponding vertical plot gives the number (or proportion) of the data 
whose values are less than or equal to it. A cumulative relative frequency plot of the data 
of Table 2.3 is given in Figure 2.6. We can conclude from this figure that 100 percent 
of the data values are less than 1,500, approximately 40 percent are less than or equal to 
900, approximately 80 percent are less than or equal to 1,100, and so on. A cumulative 
frequency plot is called an ogive. 

An efficient way of organizing a small- to moderate-sized data set is to utilize a stem 
and leaf plot. Such a plot is obtained by first dividing each data value into two parts — 
its stem and its leaf. For instance, if the data are all two-digit numbers, then we could let 
the stem part of a data value be its tens digit and let the leaf be its ones digit. Thus, for 
instance, the value 62 is expressed as 



Stem 

6 



Leaf 

2 



and the two data values 62 and 67 can be represented as 



Stem 

6 



Leaf 

2,7 



2.3 Summarizing Data Sets 



EXAMPLE 2.2c Table 2.5 gives the monthly and yearly average daily minimum tempera- 
tures in 35 U.S. cities. 

The annual average daily minimum temperatures from Table 2.5 are represented in the 
following stem and leaf plot. 



0.0 

9.0 

1.0,1.3,2.0,5.5,7.1,7.4,7.6,8.5,9.3 

0.0, 1.0, 2.4, 3.6, 3.7, 4.8, 5.0, 5.2, 6.0, 6.7, 8.1, 9.0, 9.2 

3.1,4.1,5.3,5.8,6.2,9.0,9.5,9.5 

9.0,9.8 



2.3 SUMMARIZING DATA SETS 

Modern-day experiments often deal with huge sets of data. For instance, in an attempt 
to learn about the health consequences of certain common practices, in 1951 the medical 
statisticians R. Doll and A. B. Hill sent questionnaires to all doctors in the United Kingdom 
and received approximately 40,000 replies. Their questions dealt with age, eating habits, 
and smoking habits. The respondents were then tracked for the ensuing 10 years and the 
causes of death for those who died were monitored. To obtain a feel for such a large amount 
of data, it is useful to be able to summarize it by some suitably chosen measures. In this 
section we present some summarizing statistics, where a statistic is a numerical quantity 
whose value is determined by the data. 

2.3.1 Sample Mean, Sample Median, and Sample Mode 

In this section we introduce some statistics that are used for describing the center of a set 
of data values. To begin, suppose that we have a data set consisting of the n numerical 
values x\,X2, ■ ■ ■ , x n . The sample mean is the arithmetic average of these values. 

Definition 

The sample mean, designated by x, is defined by 



x= > Xjln 

i=\ 

The computation of the sample mean can often be simplified by noting that if for constants 
a and b 

yi = axi + b, i = 1 , . . . , n 
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TABLE 2.5 Normal Daily Minimum Temperature — Selected Cities 

[In Fahrenheit degrees. Airport data except as noted. Based on standard 30-year period, 1961 through 1990] 

Annual 

State Station Jan. Feb. Mar. Apr. May June July Aug. Sept. Oct. Nov. Dec. avg. 

AL Mobile 40.0 42.7 50.1 57.1 64.4 70.7 73.2 72.9 68.7 57.3 49.1 43.1 57.4 

AK Juneau 19.0 22.7 26.7 32.1 38.9 45.0 48.1 47.3 42.9 37.2 27.2 22.6 34.1 

AZ Phoenix 41.2 44.7 48.8 55.3 63.9 72.9 81.0 79.2 72.8 60.8 48.9 41.8 59.3 

AR Little Rock 29.1 33.2 42.2 50.7 59.0 67.4 71.5 69.8 63.5 50.9 41.5 33.1 51.0 

CA Los Angeles 47.8 49.3 50.5 52.8 56.3 59.5 62.8 64.2 63.2 59.2 52.8 47.9 55.5 

Sacramento 37.7 41.4 43.2 45.5 50.3 55.3 58.1 58.0 55.7 50.4 43.4 37.8 48.1 

San Diego 48.9 50.7 52.8 55.6 59.1 61.9 65.7 67.3 65.6 60.9 53.9 48.8 57.6 

San Francisco 41.8 45.0 45.8 47.2 49.7 52.6 53.9 55.0 55.2 51.8 47.1 42.7 49.0 

CO Denver 16.1 20.2 25.8 34.5 43.6 52.4 58.6 56.9 47.6 36.4 25.4 17.4 36.2 

CT Hartfotd 15.8 18.6 28.1 37.5 47.6 56.9 62.2 60.4 51.8 40.7 32.8 21.3 39.5 

DE Wilmington 22.4 24.8 33.1 41.8 52.2 61.6 67.1 65.9 58.2 45.7 37.0 27.6 44.8 

DC Washington 26.8 29.1 37.7 46.4 56.6 66.5 71.4 70.0 62.5 50.3 41.1 31.7 49.2 

FL Jacksonville 40.5 43.3 49.2 54.9 62.1 69.1 71.9 71.8 69.0 59.3 50.2 43.4 57.1 

Miami 59.2 60.4 64.2 67.8 72.1 75.1 76.2 76.7 75.9 72.1 66.7 61.5 69.0 

GA Atlanta 31.5 34.5 42.5 50.2 58.7 66.2 69.5 69.0 63.5 51.9 42.8 35.0 51.3 

HI Honolulu 65.6 65.4 67.2 68.7 70.3 72.2 73.5 74.2 73.5 72.3 70.3 67.0 70.0 

ID Boise 21.6 27.5 31.9 36.7 43.9 52.1 57.7 56.8 48.2 39.0 31.1 22.5 39.1 

IL Chicago 12.9 17.2 28.5 38.6 47.7 57.5 62.6 61.6 53.9 42.2 31.6 19.1 39.5 

Peoria 13.2 17.7 29.8 40.8 50.9 60.7 65.4 63.1 55.2 43.1 32.5 19.3 41.0 

IN Indianapolis 17.2 20.9 31.9 41.5 51.7 61.0 65.2 62.8 55.6 43.5 34.1 23.2 42.4 

IA Des Moines 10.7 15.6 27.6 40.0 51.5 61.2 66.5 63.6 54.5 42.7 29.9 16.1 40.0 

KS Wichita 19.2 23.7 33.6 44.5 54.3 64.6 69.9 67.9 59.2 46.6 33.9 23.0 45.0 

KY Louisville 23.2 26.5 36.2 45.4 54.7 62.9 67.3 65.8 58.7 45.8 37.3 28.6 46.0 

IA New Orleans 41.8 44.4 51.6 58.4 65.2 70.8 73.1 72.8 69.5 58.7 51.0 44.8 58.5 

ME Portland 11.4 13.5 24.5 34.1 43.4 52.1 58.3 57.1 48.9 38.3 30.4 17.8 35.8 

MD Baltimore 23.4 25.9 34.1 42.5 52.6 61.8 66.8 65.7 58.4 45.9 37.1 28.2 45.2 

MA Boston 21.6 23.0 31.3 40.2 49.8 59.1 65.1 64.0 56.8 46.9 38.3 26.7 43.6 

MI Detroit 15.6 17.6 27.0 36.8 47.1 56.3 61.3 59.6 52.5 40.9 32.2 21.4 39.0 

Sault Ste. Marie 4.6 4.8 15.3 28.4 38.4 45.5 51.3 51.3 44.3 36.2 25.9 11.8 29.8 

MN Duluth -2.2 2.8 15.7 28.9 39.6 48.5 55.1 53.3 44.5 35.1 21.5 4.9 29.0 

Minneapolis-St. Paul . . . 2.8 9.2 22.7 36.2 47.6 57.6 63.1 60.3 50.3 38.8 25.2 10.2 35.3 

MS Jackson 32.7 35.7 44.1 51.9 60.0 67.1 70.5 69.7 63.7 50.3 42.3 36.1 52.0 

MO Kansas City 16.7 21.8 32.6 43.8 53.9 63.1 68.2 65.7 56.9 45.7 33.6 21.9 43.7 

St. Louis 20.8 25.1 35.5 46.4 56.0 65.7 70.4 67.9 60.5 48.3 37.7 26.0 46.7 

MT Great Falls 11.6 17.2 22.8 31.9 40.9 48.6 53.2 52.2 43.5 35.8 24.3 14.6 33.1 

Source: U.S. National Oceanic and Atmospheric Administration, Climatography of the United States, No. 81. 
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then the sample mean of the data setyi, . . . ,y n is 

n n n 

y = 2k ax i + b)ln = 2_. axiln + 2_, bin = ax + b 
i=\ i=\ i=\ 

EXAMPLE 2.3a The winning scores in the U.S. Masters golf tournament in the years from 
1982 to 1991 were as follows: 

284, 280, 277, 282, 279, 285, 281, 283, 278, 277 

Find the sample mean of these scores. 

SOLUTION Rather than directly adding these values, it is easier to first subtract 280 from 
each one to obtain the new values j// —x, — 280: 

4,0,-3,2,-1,5,1,3,-2,-3 

Because the arithmetic average of the transformed data set is 

7 = 6/10 

it follows that 

x =y + 280 = 280.6 ■ 

Sometimes we want to determine the sample mean of a data set that is presented in 
a frequency table listing the k distinct values V\,...,v^ having corresponding frequencies 
f\, . . . ,fi. Since such a data set consists of n= ~Y^i=\fi observations, with the value V{ 
appearing fi times, for each i— 1, . . . ,k, it follows that the sample mean of these n data 
values is 



x = 2_j Vifcln 
i=\ 

By writing the preceding as 

- fi , h , fk 

X= —V\ -\ V2+ ■ ■■ H Vk 

n n n 

we see that the sample mean is a weighted average of the distinct values, where the weight 
given to the value Vi is equal to the proportion of the n data values that are equal to 
vi,i= l,...,k. 
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EXAM PLE 2.3b The following is a frequency table giving the ages of members of a symphony 
orchestra for young adults. 



Age 


Frequency 


15 


2 


16 


5 


17 


11 


18 


9 


19 


14 


20 


13 



Find the sample mean of the ages of the 54 members of the symphony. 
SOLUTION 

x = (15.2+16-5+17-11 + 18-9+19-14 + 20-13)/54«18.24 ■ 

Another statistic used to indicate the center of a data set is the sample median; loosely 
speaking, it is the middle value when the data set is arranged in increasing order. 

Definition 

Order the values of a data set of size n from smallest to largest. If n is odd, the sample 
median is the value in position {it + l)/2; if n is even, it is the average of the values in 
positions «/2 and nil + 1. 

Thus the sample median of a set of three values is the second smallest; of a set of four 
values, it is the average of the second and third smallest. 

EXAMPLE 2.3c Find the sample median for the data described in Example 2.3b. 

SOLUTION Since there are 54 data values, it follows that when the data are put in increasing 
order, the sample median is the average of the values in positions 27 and 28. Thus, the 
sample median is 18.5. I 

The sample mean and sample median are both useful statistics for describing the 
central tendency of a data set. The sample mean makes use of all the data values and 
is affected by extreme values that are much larger or smaller than the others; the sample 
median makes use of only one or two of the middle values and is thus not affected by 
extreme values. Which of them is more useful depends on what one is trying to learn 
from the data. For instance, if a city government has a flat rate income tax and is trying to 
estimate its total revenue from the tax, then the sample mean of its residents' income would 
be a more useful statistic. On the other hand, if the city was thinking about constructing 
middle-income housing, and wanted to determine the proportion of its population able 
to afford it, then the sample median would probably be more useful. 
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EXAMPLE 2.3d In a study reported in Hoel, D. G., "A representation of mortality data by 
competing risks," Biometrics, 28, pp. 475-488, 1972, a group of 5-week-old mice were 
each given a radiation dose of 300 rad. The mice were then divided into two groups; 
the first group was kept in a germ-free environment, and the second in conventional 
laboratory conditions. The numbers of days until death were then observed. The data for 
those whose death was due to thymic lymphoma are given in the following stem and leaf 
plots (whose stems are in units of hundreds of days); the first plot is for mice living in the 
germ-free conditions, and the second for mice living under ordinary laboratory conditions. 

Germ-Free Mice 



58,92,93,94,95 

02, 12, 15, 29, 30, 37, 40, 44, 41, 59 

01,01,21,37 

15,34,44,85,96 

29,37 

24 

07 

00 



Conventional Mice 



59,89,91,98 

35,45,50,56,61,65,66,80 
43,56,83 
03,14,28,32 



Determine the sample means and the sample medians for the two sets of mice. 

SOLUTION It is clear from the stem and leaf plots that the sample mean for the set of mice put 
in the germ-free setting is larger than the sample mean for the set of mice in the usual labora- 
tory setting; indeed, a calculation gives that the former sample mean is 344.07, whereas the 
latter one is 292.32. On the other hand, since there are 29 data values for the germ-free mice, 
the sample median is the 1 5th largest data value, namely, 259; similarly, the sample median 
for the other set of mice is the 10th largest data value, namely, 265. Thus, whereas the 
sample mean is quite a bit larger for the first data set, the sample medians are approximately 
equal. The reason for this is that whereas the sample mean for the first set is greatly affected 
by the five data values greater than 500, these values have a much smaller effect on the 
sample median. Indeed, the sample median would remain unchanged if these values were 
replaced by any other five values greater than or equal to 259. It appears from the stem and 
leaf plots that the germ-free conditions probably improved the life span of the five longest 
living rats, but it is unclear what, if any, effect it had on the life spans of the other rats. ■ 
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Another statistic that has been used to indicate the central tendency of a data set is the 
sample mode, defined to be the value that occurs with the greatest frequency. If no single 
value occurs most frequently, then all the values that occur at the highest frequency are 
called modal values. 

EXAMPLE 2.3e The following frequency table gives the values obtained in 40 rolls of a die. 



Value 


Frequency 


1 


9 


2 


8 


3 


5 


4 


5 


5 


6 


6 


7 



Find (a) the sample mean, (b) the sample median, and (c) the sample mode. 
SOLUTION (a) The sample mean is 

x = (9 + 16 + 15 + 20 + 30 + 42)/40 = 3. 05 

(b) The sample median is the average of the 20th and 21st smallest values, and is thus 
equal to 3. (c) The sample mode is 1, the value that occurred most frequently. ■ 

2.3.2 Sample Variance and Sample Standard Deviation 

Whereas we have presented statistics that describe the central tendencies of a data set, 
we are also interested in ones that describe the spread or variability of the data values. 
A statistic that could be used for this purpose would be one that measures the average 
value of the squares of the distances between the data values and the sample mean. This 
is accomplished by the sample variance, which for technical reasons divides the sum of 
the squares of the differences by n — 1 rather than n, where n is the size of the data set. 

Definition 

The sample variance, call it s , of the data set x\ , . . . , x„ is defined by 



s 2 = Y^{x, - x) 2 /(n - 1) 



/=1 

EXAMPLE 2.3f Find the sample variances of the data sets A and B given below. 
A: 3, 4, 6, 7, 10 B: -20, 5, 15,24 
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SOLUTION As the sample mean for data set A is x = (3 + 4 + 6 + 7 + 10)/5 = 6, it follows 
that its sample variance is 

s 2 = [(-3) 2 + (-2) 2 + 2 + l 2 + 4 2 ]/4 = 7.5 

The sample mean for data set B is also 6; its sample variance is 

s 2 = [(-26) 2 + (- 1) 2 + 9 2 + (18) 2 ]/3 « 360.67 

Thus, although both data sets have the same sample mean, there is a much greater 
variability in the values of the B set than in the A set. ■ 

The following algebraic identity is often useful for computing the sample variance: 

An Algebraic Identity 

n n 

^Y_ i {xi-x) 2 = y\ 

i=\ i=\ 

The identity is proven as follows: 



2 -2 

X; — nx 



i=\ i=\ 

n n n 



2 (xj — x) = y^ (x i — 2xiX + x ) 

n n 

i=\ i=\ 

n 

= y^ x i — 2nx + 
i=\ 
n 

= £*; 



i- 



=i i=\ 



■ nx 



2 -2 



t'=l 
The computation of the sample variance can also be eased by noting that if 

y; = a + bx;, i = 1 , . . . , n 

then y = a + bx, and so 

n n 

Y J {y t -y) 2 = b 2 Y J ^-' x ^ 



i=\ i=\ 



That is, if s 2 and s 2 are the respective sample variances, then 



s 2 = b 2 s 2 

y x 
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In other words, adding a constant to each data value does not change the sample variance; 
whereas multiplying each data value by a constant results in a new sample variance that is 
equal to the old one multiplied by the square of the constant. 

EXAMPLE 2.3g The following data give the worldwide number of fatal airline accidents 
of commercially scheduled air transports in the years from 1985 to 1993. 



Year 


1985 


1986 


1987 


1988 


1989 


1990 


1991 


1992 


1993 


Accidents 


22 


22 


26 


28 


27 


25 


30 


29 


24 



Source: Civil Aviation Statistics of the World, annual. 

Find the sample variance of the number of accidents in these years. 

SOLUTION Let us start by subtracting 22 from each value, to obtain the new data set: 

0,0,4,6,5,3,8,7,2 
Calling the transformed datajyi, . . . ,y$, we have 

n n 

£^• = 35, 5~^jr? = 16 + 36 + 25 + 9 + 64 + 49 + 4 = 203 

Hence, since the sample variance of the transformed data is equal to that of the original 
data, upon using the algebraic identity we obtain 

?= 203 -9(35/»* 

8 
Program 2.3 on the text disk can be used to obtain the sample variance for large data 
sets. 

The positive square root of the sample variance is called the sample standard deviation. 



Definition 



The quantity s, defined by 






Y^{xi-x) 2 l(n-l) 



is called the sample standard deviation. 

The sample standard deviation is measured in the same units as the data. 

2.3.3 Sample Percentiles and Box Plots 

Loosely speaking, the sample 100/) percentile of a data set is that value such that 100/) 
percent of the data values are less than or equal to it, < p < 1 . More formally, we have 
the following definition. 
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Definition 

The sample lOOp percentile is that data value such that 100/> percent of the data are less 
than or equal to it and 100(1 — p) percent are greater than or equal to it. If two data values 
satisfy this condition, then the sample 100/> percentile is the arithmetic average of these 
two values. 

To determine the sample 100/> percentile of a data set of size n, we need to determine 
the data values such that 



1 . At least np of the values are less than or equal to it. 

2. At least n{\ — p) of the values are greater than or equal to it. 

To accomplish this, first arrange the data in increasing order. Then, note that if np is not 
an integer, then the only data value that satisfies the preceding conditions is the one whose 
position when the data are ordered from smallest to largest is the smallest integer exceeding 
np. For instance, if n = 22, p = .8, then we require a data value such that at least 17.6 of 
the values are less than or equal to it, and at least 4 A of them are greater than or equal to 
it. Clearly, only the 18th smallest value satisfies both conditions and this is the sample 80 
percentile. On the other hand, if np is an integer, then it is easy to check that both the 
values in positions np and np + 1 satisfy the preceding conditions, and so the sample lOOp 
percentile is the average of these values. 

EXAMPLE 2.3h Table 2.6 lists the populations of the 25 most populous U.S. cities for the 
year 1994. For this data set, find (a) the sample 10 percentile and (b) the sample 80 
percentile. 

SOLUTION (a) Because the sample size is 25 and 25(.10) =2.5, the sample 10 percentile 
is the third smallest value, equal to 520,947. 

(b) Because 25(.80) = 20, the sample 80 percentile is the average of the twentieth and 
the twenty-first smallest values. Hence, the sample 80 percentile is 

U51.977+ 1.524,249 = . 

2 
The sample 50 percentile is, of course, just the sample median. Along with the sample 
25 and 75 percentiles, it makes up the sample quartiles. 

Definition 

The sample 25 percentile is called the first quartile; the sample 50 percentile is called the 
sample median or the second quartile; the sample 75 percentile is called the third quartile. 

The quartiles break up a data set into four parts, with roughly 25 percent of the data 
being less than the first quartile, 25 percent being between the first and second quartile, 
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TABLE 2.6 Population of 25 Largest U.S. Cities, 1994 
Rank City 

1 New York, NY 

2 Los Angeles, CA 

3 Chicago, IL 

4 Houston, TX 

5 Philadelphia, PA 

6 San Diego, CA 

7 Phoenix, AR 

8 Dallas, TX 

9 San Antonio, TX 

10 Detroit, MI 

1 1 San Jose, CA 

12 Indianapolis, IN 

13 San Francisco, CA 

14 Baltimore, MD 

1 5 Jacksonville, FL 

16 Columbus, OH 

17 Milwaukee, WI 

18 Memphis, TN 

19 El Paso, TX 

20 Washington, D.C 

21 Boston, MA 

22 Seattle, WA 

23 Austin, TX 

24 Nashville, TN 

25 Denver, CO 



Population 



7,333,253 

3,448,613 

2,731,743 

1,702,086 

1,524,249 

1,151,977 

1,048,949 

1,022,830 

998,905 

992,038 

816,884 

752,279 

734,676 

702,979 

665,070 

635,913 

617,044 

614,289 

579,307 

567,094 

547,725 

520,947 

514,013 

504,505 

493,559 



25 percent being between the second and third quartile, and 25 percent being greater than 
the third quartile. 

EXAMPLE 2.3i Noise is measured in decibels, denoted as dB. One decibel is about the level 
of the weakest sound that can be heard in a quiet surrounding by someone with good 
hearing; a whisper measures about 30 dB; a human voice in normal conversation is about 
70 dB; a loud radio is about 100 dB. Ear discomfort usually occurs at a noise level of about 
120 dB. 

The following data give noise levels measured at 36 different times directly outside of 
Grand Central Station in Manhattan. 

82, 89, 94, 1 10, 74, 122, 1 12, 95, 100, 78, 65, 60, 90, 83, 87, 75, 1 14, 85 
69, 94, 124, 115, 107, 88, 97, 74, 72, 68, 83, 91, 90, 102, 77, 125, 108, 65 



Determine the quartiles. 
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27 



40 



30 31 .5 34 



FIGURE 2.7 A box plot. 

SOLUTION A stem and leaf plot of the data is as follows: 



6 


0,5,5,8,9 


7 


2,4,4,5,7,8 


8 


2,3,3,5,7,8,9 


9 


0,0,1,4,4,5,7 


10 


0,2,7,8 


11 


0,2,4,5 


12 


2,4,5 



The first quartile is 74.5, the average of the 9th and 10th smallest data values; the second 
quartile is 89.5, the average of the 18th and 19th smallest values; the third quartile is 
104.5, the average of the 27th and 28th smallest values. ■ 

A box plot is often used to plot some of the summarizing statistics of a data set. A straight 
line segment stretching from the smallest to the largest data value is drawn on a horizontal 
axis; imposed on the line is a "box," which starts at the first and continues to the third 
quartile, with the value of the second quartile indicated by a vertical line. For instance, 
the 42 data values presented in Table 2. 1 go from a low value of 27 to a high value of 40. 
The value of the first quartile (equal to the value of the 1 1th smallest on the list) is 30; the 
value of the second quartile (equal to the average of the 21st and 22nd smallest values) is 
31.5; and the value of the third quartile (equal to the value of the 32nd smallest on the 
list) is 34. The box plot for this data set is shown in Figure 2.7. 

The length of the line segment on the box plot, equal to the largest minus the smallest 
data value, is called the range of the data. Also, the length of the box itself, equal to the 
third quartile minus the first quartile, is called the interquartile range. 



1A CHEBYSHEV'S INEQUALITY 

Let x and s be the sample mean and sample standard deviation of a data set. Assuming that 
s > 0, Chebyshev's inequality states that for any value of k > 1, greater than 100(1 — Ilk ) 
percent of the data lie within the interval from x — ks to x + ks. Thus, by letting k = 3/2, 
we obtain from Chebyshev's inequality that greater than 100(5/9) = 55.56 percent of the 
data from any data set lies within a distance 1. 5s of the sample mean x; letting k = 2 shows 
that greater than 75 percent of the data lies within 2s of the sample mean; and letting k = 3 
shows that greater than 800/9^88.9 percent of the data lies within 3 sample standard 
deviations of x. 
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When the size of the data set is specified, Chebyshev's inequality can be sharpened, as 
indicated in the following formal statement and proof. 

Chebyshev's Inequality 

Let x and s be the sample mean and sample standard deviation of the data set consisting 
of the data x\, . . . , x„, where s > 0. Let 

Sk = [i, 1 < i < n: \xi —x\< ks] 

and let N(S(.) be the number of elements in the set Sj,. Then, for any k > 1, 

N(S k ) n-\ 1 



Proof 

n 
{n -l) s i = Y j {x l -x) 2 

i=\ 

= 2_^{xi — x) 2 + /X x i ~ *} 2 

> y.^.-x) 2 

iffSk 

= k 2 s 2 {n-N{S k )) 

where the first inequality follows because all terms being summed are nonnegative, and the 
second follows since (x\ — x) > k s when i £Sk- Dividing both sides of the preceding 
inequality by nk s yields that 

n-\ > 1 N(S k ) 
nk 2 ~ n 

and the result is proven. □ 

Because Chebyshev's inequality holds universally, it might be expected for given data 
that the actual percentage of the data values that lie within the interval from x — ks to 
x + ks might be quite a bit larger than the bound given by the inequality. 

EXAMPLE 2.4a Table 2.7 lists the 10 top-selling passenger cars in the United States in 
1999. A simple calculation gives that the sample mean and sample standard deviation of 



2.4 Chebyshev's Inequality 29 



TABLE 2.7 Top 10 Selling Cars for 1999 



1999 



these data are 



1. Toyota Camry 448,162 

2. Honda Accord 404,192 

3. Ford Taurus 368,327 

4. Honda Civic 318,308 

5. Chevy Cavalier 272,122 

6. Ford Escort 260,486 

7. Toyota Corolla 249,128 

8. Pontiac Grand Am 234,936 

9. Chevy Malibu 218,540 

10. Saturn S series 207,977 



x = 298,217.8, 5=124,542.9 



Thus Chebyshev's inequality yields that at least 100(5/9) = 55.55 percent of the data lies 
in the interval 

x--s,x+-sj =(173,674.9, 422,760.67) 

whereas, in actuality, 90 percent of the data falls within those limits. ■ 

Suppose now that we are interested in the fraction of data values that exceed the sample 
mean by at least k sample standard deviations, where k is positive. That is, suppose that x 
and s are the sample mean and the sample standard deviation of the data set x\, x%, . . . , x n . 
Then, with 

N{k) = number of i : xi — x > ks 

what can we say about N{k)/n? Clearly, 

./V(£) number of i : \xj — x\ > ks 

n ~ n 

< — - by Chebyshev's inequality 
k l 

However, we can make a stronger statement, as is shown in the following one-sided version 
of Chebyshev's inequality. 

The One-Sided Chebyshev Inequality 

For k > 0, 

N{k) 1 



l+k 2 
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Proof 

Let yi = X{ — x, i = !,...,». For any b > 0, we have that 



E<* 


+b) 2 


> J2 (y> + ® 2 


i=\ 




i\yi>ks 

> J2 (ks+b) 1 

i'.yi>ks 

= N{k){ks + b) 2 



(2.4.1) 

where the first inequality follows because {yi + b) > 0, and the second because both ks 
and b are positive. However, 

n n 

i=\ i=\ 

n n 

= Y,y 2 i+ 2b Hy i+nb2 

= {n- l)^ 2 + nb 2 

where the final equation used that ~Y^l=\yi = 12"=\( x i — x) = Yl"=i x i — nx = 0. 
Therefore, we obtain from Equation (2.4.1) that 

(n-l)s 2 + nb 2 

N{b) - (h+tf 

implying that 

N(k) s 2 + b 2 



n {ks + b) 2 



Because the preceding is valid for all b > 0, we can set b = slk (which is the value of b 
that minimizes the right-hand side of the preceding) to obtain that 



N{k) s 2 + s 2 lk 2 



n {ks + slk) 2 



Multiplying the numerator and the denominator of the right side of the preceding by k 2 ls 2 
gives 

N{k) k 2 + \ 1 



n {k 2 + l) 2 k 2 + 1 

and the result is proven. Thus, for instance, where the usual Chebyshev inequality shows 
that at most 25 percent of data values are at least 2 standard deviations greater than 
the sample mean, the one-sided Chebyshev inequality lowers the bound to "at most 
20 percent." ■ 
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2.5 NORMAL DATA SETS 

Many of the large data sets observed in practice have histograms that are similar in shape. 
These histograms often reach their peaks at the sample median and then decrease on both 
sides of this point in a bell-shaped symmetric fashion. Such data sets are said to be normal 
and their histograms are called normal histograms. Figure 2.8 is the histogram of a normal 
data set. 

If the histogram of a data set is close to being a normal histogram, then we say that 
the data set is approximately normal. For instance, we would say that the histogram given 
in Figure 2.9 is from an approximately normal data set, whereas the ones presented in 
Figures 2.10 and 2.11 are not (because each is too nonsymmetric) . Any data set that is 
not approximately symmetric about its sample median is said to be skewed. It is "skewed 
to the right" if it has a long tail to the right and "skewed to the left" if it has a long tail 
to the left. Thus the data set presented in Figure 2.10 is skewed to the left and the one of 
Figure 2.11 is skewed to the right. 

It follows from the symmetry of the normal histogram that a data set that is approxi- 
mately normal will have its sample mean and sample median approximately equal. 



FIGURE 2.8 Histogram of a normal data set. 



' 



\ 



Ok 



FIGURE 2.9 Histogram of an approximately normal data set. 
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1 




FIGURE 2. 1 Histogram of a data set skewed to the left. 




FIGURE 2. 1 1 Histogram of a data set skewed to the right. 



Suppose that x and s are the sample mean and sample standard deviation of an approxi- 
mately normal data set. The following rule, known as the empirical rule, specifies the 
approximate proportions of the data observations that are within s, 2s, and 3s of the 
sample mean x. 

The Empirical Rule 

If a data set is approximately normal with sample mean x and sample standard deviation 
s, then the following statements are true. 



1 . Approximately 68 percent of the observations lie within 

x± s 

2. Approximately 95 percent of the observations lie within 

x±2s 
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3. Approximately 99.7 percent of the observations lie within 

x±3s 

EXAMPLE 2.5a The following stem and leaf plot gives the scores on a statistics exam taken 
by industrial engineering students. 



0,1,4 

3,5,5,7,8 

2,4,4,5,7,7,8 

0,2,3,4,6,6 

2,5,5,6,8 

3,6 



By standing the stem and leaf plot on its side we can see that the corresponding histogram 
is approximately normal. Use it to assess the empirical rule. 

SOLUTION A calculation gives that 

x^ 70. 571, s^ 14. 354 

Thus the empirical rule states that approximately 68 percent of the data are between 56.2 
and 84.9; the actual percentage is 1,500/28 & 53.6. Similarly, the empirical rule gives that 
approximately 95 percent of the data are between 41.86 and 99.28, whereas the actual 
percentage is 100. ■ 

A data set that is obtained by sampling from a population that is itself made up of 
subpopulations of different types is usually not normal. Rather, the histogram from such 
a data set often appears to resemble a combining, or superposition, of normal histograms 
and thus will often have more than one local peak or hump. Because the histogram will 
be higher at these local peaks than at their neighboring values, these peaks are similar to 
modes. A data set whose histogram has two local peaks is said to be bimodal. The data set 
represented in Figure 2.12 is bimodal. 

2.6 PAIRED DATA SETS AND THE SAMPLE 
CORRELATION COEFFICIENT 

We are often concerned with data sets that consist of pairs of values that have some 
relationship to each other. If each element in such a data set has an x value and a y value, 
then we represent the z'th data point by the pair {x,,yi). For instance, in an attempt to 
determine the relationship between the daily midday temperature (measured in degrees 
Celsius) and the number of defective parts produced during that day, a company recorded 
the data presented in Table 2.8. For this data set, Xj represents the temperature in degrees 
Celsius and yi the number of defective parts produced on day i. 
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FIGURE 2.12 Histogram of a bimodal data set. 



TABLE 2.8 Temperature and Defect Data 



Day 


Temperature 


Number of Defects 


1 


24.2 


25 


2 


22.7 


31 


3 


30.5 


36 


4 


28.6 


33 


5 


25.5 


19 


6 


32.0 


24 


7 


28.6 


27 


8 


26.5 


25 


9 


25.3 


16 


10 


26.0 


14 


11 


24.4 


22 


12 


24.8 


23 


13 


20.6 


20 


14 


25.1 


25 


15 


21.4 


25 


16 


23.7 


23 


17 


23.9 


27 


18 


25.2 


30 


19 


27.4 


33 


20 


28.3 


32 


21 


28.8 


35 


22 


26.6 


24 



A useful way of portraying a data set of paired values is to plot the data on a two- 
dimensional graph, with the x-axis representing the x value of the data and the jy-axis 
representing the y value. Such a plot is called a scatter diagram. Figure 2.13 presents a 
scatter diagram for the data of Table 2.8. 
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FIGURE 2. 1 3 A scatter diagram. 



A question of interest concerning paired data sets is whether large x values tend to be 
paired with large y values, and small x values with small y values; if this is not the case, 
then we might question whether large values of one of the variables tend to be paired 
with small values of the other. A rough answer to these questions can often be provided 
by the scatter diagram. For instance, Figure 2.13 indicates that there appears to be some 
connection between high temperatures and large numbers of defective items. To obtain 
a quantitative measure of this relationship, we now develop a statistic that attempts to 
measure the degree to which larger x values go with larger y values and smaller x values 
with smaller y values. 

Suppose that the data set consists of the paired values (xj,yi), i= 1, ...,«. To obtain 
a statistic that can be used to measure the association between the individual values of a 
set of paired data, let x and y denote the sample means of the x values and the y values, 
respectively. For data pair i, consider Xi — x the deviation of its x value from the sample 
mean, and yj —y the deviation of its y value from the sample mean. Now if x, is a large 
x value, then it will be larger than the average value of all the x's, so the deviation Xi — x 
will be a positive value. Similarly, when xi is a small x value, then the deviation X{ — x will 
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be a negative value. Because the same statements are true about the y deviations, we can 
conclude the following: 

When large values of the x variable tend to be associated with large values 
of the y variable and small values of the x variable tend to be associated 
with small values of the y variable, then the signs, either positive or 
negative, of Xi — x and yt —y will tend to be the same. 

Now, if Xi —x and yi —y both have the same sign (either positive or negative), then 
their product (xi — x){yi —y) will be positive. Thus, it follows that when large x values 
tend to be associated with large y values and small x values are associated with small y 
values, then X^=i( x * — *)(j« — j) wlu ten d t0 be a large positive number. [In fact, not 
only will all the products have a positive sign when large (small) x values are paired with 
large (small) y values, but it also follows from a mathematical result known as Hardy's 
lemma that the largest possible value of the sum of paired products will be obtained when 
the largest Xi — x is paired with the largest yi — y, the second largest Xi — x is paired with 
the second largest yi — y, and so on.] In addition, it similarly follows that when large values 
of Xi tend to be paired with small values of yi then the signs of Xj — x and yi —y will be 
opposite and so X7=i( x * — *X yi ~y) wu l be a large negative number. 

To determine what it means for X);=i(*« — x){yi ~j) t0 be "large," we standardize 
this sum first by dividing by n — 1 and then by dividing by the product of the two sample 
standard deviations. The resulting statistic is called the sample correlation coefficient. 

Definition 

Let s x and s y denote, respectively, the sample standard deviations of the x values and the 
y values. The sample correlation coefficient, call it r, of the data pairs (xi,yi), i—\,...,n 
is defined by 



^{xi—x){yi—y) 

i=\ 

(n - l)s x s y 



Y,(x,-x)(yi-y) 
i=\ 



Y<(xi-x) 2 J2(yi-y) 2 

i=\ i-l 



When r > we say that the sample data pairs are positively correlated, and when r<0we 
say that they are negatively correlated. 

The following are properties of the sample correlation coefficient. 
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Properties of r 

1. -1 < r < 1 

2. If for constants a and b, with b > 0, 

j/ = <z+ for/, » = 1, . . . , n 

then r = 1 . 

3. If for constants <z and b, with £ < 0, 

yi — a + for/. » = 1, . . . , » 

then r = — 1 . 

4. If r is the sample correlation coefficient for the data pairs Xi,yi, i = 1, . . . , n then it 
is also the sample correlation coefficient for the data pairs 

a + bxi, c + dy-, i = 1, . . . , n 

provided that b and d are both positive or both negative. 

Property 1 says that the sample correlation coefficient r is always between —1 and +1. 
Property 2 says that r will equal + 1 when there is a straight line (also called a linear) relation 
between the paired data such that large y values are attached to large x values. Property 3 
says that r will equal — 1 when the relation is linear and large jy values are attached to small 
x values. Property 4 states that the value of r is unchanged when a constant is added to each 
of the x variables (or to each of the y variables) or when each x variable (or eachjj/ variable) 
is multiplied by a positive constant. This property implies that r does not depend on the 
dimensions chosen to measure the data. For instance, the sample correlation coefficient 
between a person's height and weight does not depend on whether the height is measured 
in feet or in inches nor whether the weight is measured in pounds or in kilograms. Also, if 
one of the values in the pair is temperature, then the sample correlation coefficient is the 
same whether it is measured in Fahrenheit or in Celsius. 

The absolute value of the sample correlation coefficient r (that is, \r\, its value without 
regard to its sign) is a measure of the strength of the linear relationship between the x and 
the y values of a data pair. A value of |r| equal to 1 means that there is a perfect linear 
relation — that is, a straight line can pass through all the data points (xi,yi), i—\,...,n. 
A value of \r\ of around .8 means that the linear relation is relatively strong; although there 
is no straight line that passes through all of the data points, there is one that is "close" to 
them all. A value for \r\ of around .3 means that the linear relation is relatively weak. 
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FIGURE 2.14 Sample correlation coefficients. 



The sign of r gives the direction of the relation. It is positive when the linear relation is 
such that smaller y values tend to go with smaller x values and larger y values with larger x 
values (and so a straight line approximation points upward), and it is negative when larger 
y values tend to go with smaller x values and smaller y values with larger x values (and so 
a straight line approximation points downward). Figure 2.14 displays scatter diagrams for 
data sets with various values of r. 

EXAMPLE 2.6a Find the sample correlation coefficient for the data presented in Table 2.8. 
SOLUTION A computation gives the solution 

r=.4l89 



thus indicating a relatively weak positive correlation between the daily temperature and 
the number of defective items produced that day. ■ 
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FIGURE 2. 1 5 Scatter diagram of years in school and pulse rate. 



EXAMPLE 2.6b The following data give the resting pulse rates (in beats per minute) and 
the years of schooling of 10 individuals. A scatter diagram of these data is presented in 
Figure 2.15. The sample correlation coefficient for these data is r = —.7638. This negative 
correlation indicates that for this data set a high pulse rate is strongly associated with 
a small number of years in school, and a low pulse rate with a large number of years in 
school. ■ 



Person 


1 


2 


3 


4 


5 


6 


7 


8 


9 


10 


Years of School 
Pulse Rate 


12 

73 


16 

67 


13 

74 


18 

63 


19 

73 


12 

84 


18 
60 


19 

62 


12 

76 


14 

71 



Correlation Measures Association, Not Causation 

The results of Example 2.6b indicate a strong negative correlation 
between an individual's years of education and that individual's rest- 
ing pulse rate. However, this does not imply that additional years of 
school will directly reduce one's pulse rate. That is, whereas additional 
years of school tend to be associated with a lower resting pulse rate, this 
does not mean that it is a direct cause of it. Often, the explanation for 
such an association lies with an unexpressed factor that is related to both 
variables under consideration. In this instance, it may be that a person 
who has spent additional time in school is more aware of the latest find- 
ings in the area of health, and thus may be more aware of the importance 
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of exercise and good nutrition; or it may be that it is not knowledge that 
is making the difference but rather it is that people who have had more 
education tend to end up in jobs that allow them more time for exercise 
and money for good nutrition. The strong negative correlation between 
years in school and resting pulse rate probably results from a combination 
of these as well as other underlying factors. 



We will now prove the first three properties of the sample correlation coefficient r. That 
is, we will prove that \r\ < 1 with equality when the data lie on a straight line. To begin, 
note that 

W^_tiy>o (2 .6,, 



^2 I ~\2 



Y^ w — x) v^ vyi — yi y^ w — x>\ 

s x s y s x s ; 

n — 1 + « — 1— 2(n — \)r > 



showing that 

r < 1 

Note also that r = 1 if and only if there is equality in Equation (2.6.1). That is, r = 1 if 
and only if for all i, 

yi—y _ x% — x 

Sy $x 

or, equivalently, 

_ y — y 

yi=y x h — xi 

That is, r = 1 if and only if the data values (xj,yi) lie on a straight line having a positive 
slope. 

To show that r > — 1, with equality if and only if the data values (xi,yi) lie on a straight 
line having a negative slope, start with 

and use an argument analogous to the one just given. 
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Problems 

1. The following is a sample of prices, rounded to the nearest cent, charged per gallon 
of standard unleaded gasoline in the San Francisco Bay area in June 1997. 

137, 139, 141, 137, 144, 141, 139, 137, 144, 141, 143, 143, 141 

Represent these data in 

(a) a frequency table; 

(b) a relative frequency line graph. 

2. Explain how a pie chart can be constructed. If a data value had relative frequency 
r, at what angle would the lines defining its sector meet? 

3. The following are the estimated oil reserves, in billions of barrels, for four regions 
in the western hemisphere. 

United States 38.7 

South America 22.6 

Canada 8.8 

Mexico 60.0 

Represent these data in a pie chart. 

4. The following table gives the average travel time to work for workers in each of the 
50 states as well as the percentage of those workers who use public transportation. 

(a) Represent the data relating to the average travel times in a histogram. 

(b) Represent the data relating to the percentage of workers using public 
transportation in a stem and leaf plot. 





Means of Transportation 
to Work 


Average Travel 
Time to Work 1 


Region, Division, 


Percent Using Public 


and State 


Transportation 


(minutes) 


United States . . 


5.3 


22.4 


Northeast 


12.8 


24.5 


New England .... 


5.1 


21.5 


Maine 


0.9 


19.0 


New Hampshire 


0.7 


21.9 


Vermont 


0.7 


18.0 


Massachusetts . . 


8.3 


22.7 



(continued) 
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Region, Division, 
and State 


Means of Transportation 
to Work 

Percent Using Public 
Transportation 


Average Travel 
Time to Work 1 

(minutes) 


Rhode Island 


2.5 


19.2 


Connecticut 


3.9 


21.1 


Middle Atlantic 


15.7 


25.7 




24.8 
8.8 
6.4 
3.5 


28.6 


New Jersey 

Pennsylvania 

Midwest 


25.3 
21.6 
20.7 


East North Central . . . 


4.3 


21.7 


Ohio 


2.5 


20.7 


Indiana 


1.3 


20.4 


Illinois 


10.1 


25.1 


Michigan 


1.6 


21.2 


Wisconsin 


2.5 


18.3 


West North Central . . 


1.9 


18.4 


Minnesota 


3.6 


19.1 


Iowa 


1.2 


16.2 


Missouri 


2.0 


21.6 


North Dakota 


0.6 


13.0 


South Dakota 


0.3 


13.8 


Nebraska 


1.2 


15.8 


Kansas 


0.6 


17.2 


South 


2.6 


22.0 


South Atlantic 


3.4 


22.5 


Delaware 


2.4 


20.0 


Maryland 

Virginia 


8.1 
4.0 


27.0 
24.0 


West Virginia 


1.1 


21.0 


North Carolina .... 


1.0 


19.8 


South Carolina .... 


1.1 


20.5 


Georgia 


2.8 


22.7 


Florida 


2.0 


21.8 


East South Central . . . 


1.2 


21.1 


Kentucky 

Tennessee 


1.6 
1.3 


20.7 
21.5 


Alabama 


0.8 


21.2 


Mississippi 

West South Central . . 


0.8 
2.0 


20.6 
216 


Arkansas 


0.5 


19.0 



(continued) 
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Means of Transportation 
to Work 

Region, Division, Percent Using Public 

and State Transportation 

Louisiana 3.0 

Oklahoma 0.6 

Texas 2.2 

West 4.1 

Mountain 2.1 

Montana 0.6 

Idaho 1.9 

Wyoming 1.4 

Colorado 2.9 

New Mexico 1.0 

Arizona 2.1 

Utah 2.3 

Nevada 2.7 

Pacific 4.8 

Washington 4.5 

Oregon 3.4 

California 4.9 

Alaska 2.4 

Hawaii 7.4 

Excludes persons who worked at home. 
Source: U.S. Bureau of the Census. Census of Population and Housing, 1990. 



Average Travel 
Time to Work 1 

(minutes) 



22.3 
19.3 
22.2 
22.7 
19.7 
14.8 
17.3 
15.4 
20.7 
19.1 
21.6 
18.9 
19.8 
23.8 
22.0 
19.6 
24.6 
16.7 
23.8 



Choose a book or article and count the number of words in each of the first 100 
sentences. Present the data in a stem and leaf plot. Now choose another book or 
article, by a different author, and do the same. Do the two stem and leaf plots look 
similar? Do you think this could be a viable method for telling whether different 
articles were written by different authors? 

The following table gives the number of commercial airline accidents and fatalities 
in the United States in the years from 1980 to 1995. 

(a) Represent the number of yearly airline accidents in a frequency table. 

(b) Give a frequency polygon graph of the number of yearly airline accidents. 

(c) Give a cumulative relative frequency plot of the number of yearly airline 
accidents. 

(d) Find the sample mean of the number of yearly airline accidents. 

(e) Find the sample median of the number of yearly airline accidents. 
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U.S. Airline Safety, Scheduled Commercial Carriers, 1980—1995 











Fatal 


















Accidents 














Fatal 




per 






Fatal 






Departures 


Acci- 


Fatal- 


100,000 




Departures 


Acci- 


Fatal 




(millions) 


dents 


ities 


Departures 




(millions) 


dents 


ities 


1980 


5.4 








0.000 


1988 


6.7 


3 


285 


1981 


5.2 


4 


4 


0.077 


1989 


6.6 


11 


278 


1982 


5.0 


4 


233 


0.060 


1990 


6.9 


6 


39 


1983 


5.0 


4 


5 


0.079 


1991 


6.8 


4 


62 


1984 


5.4 


1 


4 


0.018 


1992 


7.1 


4 


33 


1985 


5.8 


4 


197 


0.069 


1993 


7.2 


1 


1 


1986 


6.4 


2 


5 


0.016 


1994 


7.5 


4 


239 


1987 


6.6 


4 


231 


0.046 1 


1995 


8.1 


2 


166 



Source: National Transportation Safety Board. 

(f ) Find the sample mode of the number of yearly airline accidents. 

(g) Find the sample standard deviation of the number of yearly airline accidents. 

7. (Use the table from Problem 6.) 

(a) Represent the number of yearly airline fatalities in a histogram. 

(b) Represent the number of yearly airline fatalities in a stem and leaf plot. 

(c) Find the sample mean of the number of yearly airline fatalities. 

(d) Find the sample median of the number of yearly airline fatalities. 

(e) Find the sample standard deviation of the number of yearly airline fatalities. 

8. The following table gives the winning scores in the Masters golf tournament for 
the years from 1967 to 2002. Use it 

(a) to construct a stem and leaf plot, and 

(b) to find the sample median of the winning scores in these years. 



Year 


Player 


Score 


1967 


Gay Brewer, Jr. 


280 


1968 


Bob Goalby 


277 


1969 


George Archer 


281 


1970 


Billy Casper (69) 


279 


1971 


Charles Coody 


279 


1972 


Jack Nicklaus 


286 


1973 


Tommy Aaron 


283 


1974 


Gary Player 


278 



(continued) 
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Year Player Score 



1975 


Jack Nicklaus 


276 


1976 


Ray Floyd 


271 


1977 


Tom Watson 


276 


1978 


Gary Player 


277 


1979 


Fuzzy Zoeller 


280 


1980 


Seve Ballesteros 


275 


1981 


Tom Watson 


280 


1982 


Craig Stadler 


284 


1983 


Seve Ballesteros 


280 


1984 


Ben Crenshaw 


277 


1985 


Bernhard Langer 


282 


1986 


Jack Nicklaus 


279 


1987 


Larry Mize 


285 


1988 


Sandy Lyle 


281 


1989 


Nick Faldo 


283 


1990 


Nick Faldo 


278 


1991 


Ian Woosnam 


277 


1992 


Fred Couples 


275 


1993 


Bernhard Langer 


277 


1994 


Jose Maria Olazabal 


279 


1995 


Ben Crenshaw 


274 


1996 


Nick Faldo 


276 


1997 


Tiger Woods 


270 


1998 


Mark O'Meara 


279 


1999 


Jose Maria Olazabal 


280 


2000 


Vijay Singh 


278 


2001 


Tiger Woods 


272 


2002 


Tiger Woods 


276 



9. Using the table given in Problem 4, find the sample mean and sample median of 
the average travel time for those states in the 

(a) northeast; 

(b) midwest; 

(c) south; 

(d) west. 

10. The following data are the median prices for single-family homes in a variety of 
American cities for the years 1992 and 1994. 

(a) Represent the 1992 data in a histogram. 

(b) Represent the 1994 data in a stem and leaf plot. 
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Median Price of Existing Single-Family Homes 



City 


Apr. 1992 


Apr. 1994 


Akron, OH 


$75,500 


$81,600 


Albuquerque, NM 


86,700 


103,100 


Anaheim/Santa Ana, CA 


235,100 


209,500 


Arlanta, GA 


85,800 


93,200 


Baltimore, MD 


111,500 


115,700 


Baton Rouge, LA 


71,800 


78,400 


Birmingham, AL 


89,500 


99,500 


Boston, MA 


168,200 


170,600 


Bradenton, FL 


80,400 


86,400 


Buffalo, NY 


79,700 


82,400 


Charleston, SC 


82,000 


91,300 


Chicago, IL 


131,100 


135,500 


Cincinnati, OH 


87,500 


93,600 


Cleveland, OH 


88,100 


94,200 


Columbia, SC 


85,100 


82,900 


Columbus, OH 


90,300 


92,800 


Corpus Christi, TX 


62,500 


71,700 


Dallas, TX 


90,500 


95,100 


Daytona Beach, FL 


63,600 


66,200 


Denver, CO 


91,300 


111,200 


Des Moines, LA 


71,200 


77,400 


Detroit, MI 


77,500 


84,500 


El Paso, TX 


65,900 


73,600 


Grand Rapids, MI 


73,000 


76,600 


Hartford, CT 


141,500 


132,900 


Honolulu, HI 


342,000 


355,000 


Houston, TX 


78,200 


84,800 


Indianapolis, IN 


80,100 


90,500 


Jacksonville, FL 


75,100 


79,700 


Kansas City, MO 


76,100 


84,900 


Knoxville, TN 


78,300 


88,600 


Las Vegas, NV 


101,400 


110,400 


Los Angeles, CA 


218,000 


188,500 



Source: National Association of Realtors: Data as of midyear 1994. 

(c) Find the sample median of these median prices for 1992. 

(d) Find the sample median of these median prices for 1994. 

11. The following table gives the number of pedestrians, classified according to age 
group and sex, killed in fatal road accidents in England in 1922. 

(a) Approximate the sample means of the ages of the males. 

(b) Approximate the sample means of the ages of the females. 
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(c) Approximate the quartiles of the males killed. 

(d) Approximate the quartiles of the females killed. 



Age Number of Males Number of Females 

0-5 120 67 

5-10 184 120 

10-15 44 22 

15-20 24 15 

20-30 23 25 

30-40 50 22 

40-50 60 40 

50-60 102 76 

60-70 167 104 

70-80 150 90 

80-100 49 27 



12. The following are the percentages of ash content in 12 samples of coal found in 
close proximity: 

9.2, 14.1,9.8, 12.4, 16.0, 12.6,22.7, 18.9,21.0, 14.5,20.4, 16.9 

Find the 

(a) sample mean, and 

(b) sample standard deviation of these percentages. 

13. Using the table given in Problem 4, find the sample variance of the average travel 
time for those states in the 

(a) south Atlantic; 

(b) mountain region. 

14. The sample mean and sample variance of five data values are, respectively, x = 104 
and s = 4. If three of the data values are 102, 100, 105, what are the other two 
data values? 

15. The following table gives the average annual pay, per state, in the years 1992 and 
1993. 

(a) Do you think that the sample mean of the averages for the 50 states will equal 
the value given for the entire United States? 

(b) If the answer to part (a) is no, explain what other information aside from just 
the 50 averages would be needed to determine the sample mean salary for the 
entire country. Also, explain how you would use the additional information 
to compute this quantity. 
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? Annual Pay, by State: 1992 and 1993 
[In dollars, except percent change. For workers covered by State unemployment insurance laws and for 
Federal civilian workers covered by unemployment compensation for Federal employees, approximately 
96 percent of wage and salary civilian employment in 1993. Excludes most agricultural workers on small 
farms, all Armed Forces, elected officials in most States, railroad employees, most domestic workers, 
most student workers at school, employees of certain nonprofit organizations, and most self-employed 
individuals. Pay includes bonuses, cash value of meals and lodging, and tips and other gratuities.] 



State 



Average 
Annual Pay 



1992 



1993 



State 



Average 
Annual Pay 



1992 



1993 



United States .... 

Alabama 

Alaska 

Arizona 

Arkansas 

California 

Colorado 

Connecticut 

Delaware 

District of Columbia . 

Florida 

Georgia 

Hawaii 

Idaho 

Illinois 

Indiana 

Iowa 

Kansas 

Kentucky 

Louisiana 

Maine 

Maryland 

Massachusetts 

Michigan 

Minnesota 

Mississippi 



25,897 

22,340 
31,825 
23,153 
20,108 
28,902 
25,040 
32,603 
26,596 
37,951 
23,145 
24,373 
25,538 
20,649 
27,910 
23,570 
20,937 
21,982 
21,858 
22,342 
21,808 
27,145 
29,664 
27,463 
25,324 
19,237 



26,362 

22,786 
32,336 
23,501 
20,337 
29,468 
25,682 
33,169 
27,143 
39,199 
23,571 
24,867 
26,325 
21,188 
28,420 
24,109 
21,441 
22,430 
22,170 
22,632 
22,026 
27,684 
30,229 
28,260 
25,711 
19,694 



Missouri 

Montana 

Nebraska 

Nevada 

New Hampshire . 

New Jersey 

New Mexico 

New York 

North Carolina . . 
North Dakota . . . 

Ohio 

Oklahoma 

Oregon 

Pennsylvania 
Rhode Island 
South Carolina . . 
South Dakota . . . 

Tennessee 

Texas 

Utah 

Vermont 

Virginia 

Washington 

West Virginia . . . 

Wisconsin 

Wyoming 



23,550 
19,378 
20,355 
24,743 
24,866 
32,073 
21,051 
32,399 
22,249 
18,945 
24,845 
21,698 
23,514 
25,785 
24,315 
21,398 
18,016 
22,807 
25,088 
21,976 
22,360 
24,940 
25,553 
22,168 
23,008 
21,215 



23,898 
19,932 
20,815 
25,461 
24,962 
32,716 
21,731 
32,919 
22,770 
19,382 
25,339 
22,003 
24,093 
26,274 
24,889 
21,928 
18,613 
23,368 
25,545 
22,250 
22,704 
25,496 
25,760 
22,373 
23,610 
21,745 



Source: U.S. Bureau of Labor Statistics, Employment and Wages Annual Averages 1993: and USDL News Release 94 - 451, Average Annual 
Pay by State and Industry, 1993. 



(c) Find the sample median of the averages for 1992 and for 1993. 

(d) Find the sample mean of the 1992 averages of the first 10 states listed. 

(e) Find the sample standard deviation of the 1993 averages of the last 10 states 
listed. 
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16. The following data represent the lifetimes (in hours) of a sample of 40 transistors: 

112, 121, 126, 108, 141, 104, 136, 134 
121, 118, 143, 116, 108, 122, 127, 140 
113, 117, 126, 130, 134, 120, 131, 133 
118, 125, 151, 147, 137, 140, 132, 119 
110, 124, 132, 152, 135, 130, 136, 128 

(a) Determine the sample mean, median, and mode. 

(b) Give a cumulative relative frequency plot of these data. 

17. An experiment measuring the percent shrinkage on drying of 50 clay specimens 
produced the following data: 



18.2 


21.2 


23.1 


18.5 


15.6 


20.8 


19.4 


15.4 


21.2 


13.4 


16.4 


18.7 


18.2 


19.6 


14.3 


16.6 


24.0 


17.6 


17.8 


20.2 


17.4 


23.6 


17.5 


20.3 


16.6 


19.3 


18.5 


19.3 


21.2 


13.9 


20.5 


19.0 


17.6 


22.3 


18.4 


21.2 


20.4 


21.4 


20.3 


20.1 


19.6 


20.6 


14.8 


19.7 


20.5 


18.0 


20.8 


15.8 


23.1 


17.0 



(a) Draw a stem and leaf plot of these data. 

(b) Compute the sample mean, median, and mode. 

(c) Compute the sample variance. 

(d) Group the data into class intervals of size 1 percent starting with the value 
13.0; and draw the resulting histogram. 

(e) For the grouped data acting as if each of the data points in an interval was 
actually located at the midpoint of that interval, compute the sample mean 
and sample variance and compare this with the results obtained in parts (b) 
and (c). Why do they differ? 

18. A computationally efficient way to compute the sample mean and sample variance 
of the data setx\,X2,---,x n is as follows. Let 

i 

-.— i=l ■— i 

Xj , 7 i. , . . . , 71 

1 J 
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be the sample mean of the first j data values; and let 



J2 ( x ' - x j) 2 

2 i=\ 



S J 



J 



1 



j = 2,...,n 



be the sample variance of the first j,j > 2, values. Then, with s\ = 0, it can be 
shown that 



and 



- _- */+! ~ X J 
x j+ 1 — Xj \ 

7 + 1 



SJ +1 = (1 - TJi f + (j +D(% + 1 -XjY 



(a) Use the preceding formulas to compute the sample mean and sample variance 
of the data values 3, 4, 7, 2, 9, 6. 

(b) Verify your results in part (a) by computing as usual. 

(c) Verify the formula given above for xj+\ in terms of Xj. 

19. Use the data concerning the prices of single-family homes provided in Problem 10 
to find the 

(a) 10 percentile of the median prices; 

(b) 40 percentile of the median prices; 

(c) 90 percentile of the median prices. 

20. Use the following table to find the quartiles of the average annual pay in the 
specified areas. 



Average Annual Pay by New York State Metropolitan Areas, 1999 



Rank 


Amt. 


Rank 


Amt. 


Albany-Sch'dy-Troy 


$31,901 


Nassau-Suffolk 


$36,944 


Binghamton 


29,167 


New York City 


52,351 


Buffalo-Niagara Falls 


30,487 


Newburgh, NY-PA 


27,671 


Dutchess County 


35,256 


Rochester 


32,588 


Elmira 


26,603 


Syracuse 


30,423 


Glens Falls 


26,140 


Utica-Rome 


25,881 


Jamestown 


24,813 


US metro area avg. 


$34,868 



Source: U.S. Bureau of Labor Statistics data. 

21. Use the following figure, which gives the amounts of federal research money given 
to 15 universities in 1992, to answer this problem. 
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Johns Hopkins University 

Mass. Institute of Technology 

Stanford University 

University of Washington 

University of Michigan 

University of CA-Los Angeles 

University of CA-San Diego 

University of CA-San Francisco 

University of Wisconsin-Madison 

Columbia University-Main Division 

University of Pennsylvania 

University of Minnesota 

Harvard University 

Yale University 

University of Pittsburgh 




50 100 150 200 250 300 350 400 450 500 550 

Millions of dollars 

Source: Chart prepared by U.S. Bureau of the Census. 



Top 15 universities — federal research and development obligations: 1992. 



(a) Which universities were given more than $225 million? 

(b) Approximate the sample mean of the amounts given to these universities. 

(c) Approximate the sample variance of the amounts given to these universities. 

(d) Approximate the quartiles of the amounts given to these universities. 

22. Use the part of the table given in Problem 4 that gives the percentage of workers 
in each state that use public transportation to get to work to draw a box plot of 
these 50 percentages. 

23. The following table gives the numbers of dogs, categorized by breed, registered in 
the American Kennel Club in 2000. Represent these numbers in a box plot. 

24. The average particulate concentration, in micrograms per cubic meter, was 
measured in a petrochemical complex at 36 randomly chosen times, with the 
following concentrations resulting: 

5, 18, 15, 7, 23, 220, 130, 85, 103, 25, 80, 7, 24, 6, 13, 65, 37, 25, 
24, 65, 82, 95, 77, 15, 70, 1 10, 44, 28, 33, 81, 29, 14, 45, 92, 17, 53 



(a) Represent the data in a histogram. 

(b) Is the histogram approximately normal? 
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Rank 


Breed 


2000 


1 


Retrievers (Labrador) 


172,841 


2 


Retrievers (Golden) 


66,300 


3 


German Shepherd Dogs 


57,660 


4 


Dachshunds 


54,773 


5 


Beagles 


52,026 


6 


Poodles 


45,868 


7 


Yorkshire Terriers 


43,574 


8 


Chihuahuas 


43,096 


9 


Boxers 


38,803 


10 


Shih Tzu 


37,599 


11 


Rottweilers 


37,355 


12 


Pomeranians 


33,568 


13 


Miniature Schnauzers 


30,472 


14 


Spaniels (Cocker) 


29,393 


15 


Pugs 


24,373 


16 


Shetland Sheepdogs 


23,866 


17 


Miniature Pinschers 


22,020 


18 


Boston Terriers 


19,922 


19 


Siberian Huskies 


17,551 


20 


Maltese 


17,446 



Source: American Kennel Club, New York, NY: Dogs registered during 
calendar year shown. 

25. A chemical engineer desiring to study the evaporation rate of water from brine 
evaporation beds obtained data on the number of inches of evaporation in each 
of 55 July days spread over 4 years. The data are given in the following stem and 
leaf plot, which shows that the smallest data value was .02 inches, and the largest 
.56 inches. 



2,6 
1,4 

1,1,1,3,3,4,5,5,5,6,9 

0, 0, 2, 2, 2, 3, 3, 3, 3, 4, 4, 5, 5, 5, 6, 6, 7, 8, 9 

0,1,2,2,2,3,4,4,4,5,5,5,7,8,8,8,9,9 

2,5,6 



Find the 

(a) sample mean; 

(b) sample median; 

(c) sample standard deviation of these data. 

(d) Do the data appear to be approximately normal? 

(e) What percentage of data values are within 1 standard deviation of the mean? 
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26. The following are the grade point averages of 30 students recently admitted to the 
graduate program in the Department of Industrial Engineering and Operations 
Research at the University of California at Berkeley. 

3.46, 3.72, 3.95, 3.55, 3.62, 3.80, 3.86, 3.71, 3.56, 3.49, 3.96, 3.90, 3.70, 3.61, 
3.72, 3.65, 3.48, 3.87, 3.82, 3.91, 3.69, 3.67, 3.72, 3.66, 3.79, 3.75, 3.93, 3.74, 
3.50,3.83 

(a) Represent the preceding data in a stem and leaf plot. 

(b) Calculate the sample mean x. 

(c) Calculate the sample standard deviation s. 

(d) Determine the proportion of the data values that lies within x ± 1.5.$ and 
compare with the lower bound given by Chebyshev's inequality. 

(e) Determine the proportion of the data values that lies within x ± 2s and 
compare with the lower bound given by Chebyshev's inequality. 

27. Do the data in Problem 26 appear to be approximately normal? For parts (c) and 
(d) of this problem, compare the approximate proportions given by the empirical 
rule with the actual proportions. 

28. Would you expect that a histogram of the weights of all the members of a health 
club would be approximately normal? 

29. Use the data of Problem 16. 

(a) Compute the sample mean and sample median. 

(b) Are the data approximately normal? 

(c) Compute the sample standard deviation s. 

(d) What percentage of the data fall within x ± 1.5*? 

(e) Compare your answer in part (d) to that given by the empirical rule. 

(f) Compare your answer in part (d) to the bound given by Chebyshev's 
inequality. 

30. Use the data concerning the first 10 states listed in the table given in Problem 15. 

(a) Draw a scatter diagram relating the 1992 and 1993 salaries. 

(b) Determine the sample correlation coefficient. 

31. The following table gives the median salaries for recent U.S. doctorate recipients, 
categorized by scientific field and type of employment. Draw a scatter diagram 
relating salaries in private firms to those in government, and determine the sample 
correlation coefficient. 

32. Use the table to find the sample correlation coefficients between salaries in 

(a) government and universities 

(b) private firms and universities. 
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Median salaries for recent U.S. doctorate recipients (1—3 years after degree), by sector of employment: 1999 
(Dollars) 

Tenure-track 
Private at four-year Other 

Ph.D. field Total noneducational Government institution Postdoc educational 



Total 


48,800 


68,000 


55,000 


43,400 


30,000 


33,000 


Computer sciences 


75,000 


82,000 


66,000 


53,000 


— 


60,000 


Engineering 


66,700 


70,000 


65,000 


56,300 


38,000 


55,000 


Life sciences 


35,000 


61,000 


48,000 


42,500 


28,000 


36,000 


Mathematical 


45,000 


60,500 


55,200 


39,500 


40,000 


38,000 


sciences 














Social sciences 


45,000 


53,000 


52,400 


40,000 


30,500 


35,000 


Physical sciences 


52,000 


64,000 


58,000 


39,400 


32,700 


39,000 



— =Fewer than 50 cases. 

Source: National Science Foundation Division of Science Resources Statistics (NSFISRS), Survey of Doctorate Recipients. 1999. 



33. Using data on the first 10 cities listed in Table 2.5, draw a scatter diagram and find 
the sample correlation coefficient between the January and July temperatures. 

34. Verify property 3 of the sample correlation coefficient. 

35. Verify property 4 of the sample correlation coefficient. 

36. In a study of children in grades 2 through 4, a researcher gave each student 
a reading test. When looking at the resulting data the researcher noted a posi- 
tive correlation between a student's reading test score and height. The researcher 
concluded that taller children read better because they can more easily see the 
blackboard. What do you think? 




ELEMENTS OF PROBABILITY 



3.1 INTRODUCTION 

The concept of the probability of a particular event of an experiment is subject to various 
meanings or interpretations. For instance, if a geologist is quoted as saying that "there is 
a 60 percent chance of oil in a certain region," we all probably have some intuitive idea 
as to what is being said. Indeed, most of us would probably interpret this statement in one 
of two possible ways: either by imagining that 

1 . the geologist feels that, over the long run, in 60 percent of the regions whose 
outward environmental conditions are very similar to the conditions that prevail 
in the region under consideration, there will be oil; or, by imagining that 

2. the geologist believes that it is more likely that the region will contain oil than it is 
that it will not; and in fact .6 is a measure of the geologist's belief in the hypothesis 
that the region will contain oil. 

The two foregoing interpretations of the probability of an event are referred to as being 
the frequency interpretation and the subjective (or personal) interpretation of probability. 
In the frequency interpretation, the probability of a given outcome of an experiment is 
considered as being a "property" of that outcome. It is imagined that this property can be 
operationally determined by continual repetition of the experiment — the probability of 
the outcome will then be observable as being the proportion of the experiments that result 
in the outcome. This is the interpretation of probability that is most prevalent among 
scientists. 

In the subjective interpretation, the probability of an outcome is not thought of as being 
a property of the outcome but rather is considered a statement about the beliefs of the 
person who is quoting the probability, concerning the chance that the outcome will occur. 
Thus, in this interpretation, probability becomes a subjective or personal concept and has 
no meaning outside of expressing one's degree of belief. This interpretation of probability 
is often favored by philosophers and certain economic decision makers. 

55 
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Regardless of which interpretation one gives to probability, however, there is a general 
consensus that the mathematics of probability are the same in either case. For instance, 
if you think that the probability that it will rain tomorrow is .3 and you feel that the 
probability that it will be cloudy but without any rain is .2, then you should feel that the 
probability that it will either be cloudy or rainy is .5 independently of your individual 
interpretation of the concept of probability. In this chapter, we present the accepted rules, 
or axioms, used in probability theory. As a preliminary to this, however, we need to study 
the concept of the sample space and the events of an experiment. 

3.2 SAMPLE SPACE AND EVENTS 

Consider an experiment whose outcome is not predictable with certainty in advance. 
Although the outcome of the experiment will not be known in advance, let us suppose 
that the set of all possible outcomes is known. This set of all possible outcomes of an 
experiment is known as the sample space of the experiment and is denoted by S. Some 
examples are the following. 

1 . If the outcome of an experiment consists in the determination of the sex of a 
newborn child, then 

S = {g,b} 

where the outcome g means that the child is a girl and b that it is a boy. 

2. If the experiment consists of the running of a race among the seven horses having 
post positions 1,2, 3, 4, 5, 6, 7, then 

S = {all orderings of (1, 2, 3, 4, 5, 6, 7)} 

The outcome (2, 3, 1,6, 5, 4, 7) means, for instance, that the number 2 horse is 
first, then the number 3 horse, then the number 1 horse, and so on. 

3. Suppose we are interested in determining the amount of dosage that must be given 
to a patient until that patient reacts positively. One possible sample space for this 
experiment is to let S consist of all the positive numbers. That is, let 

S = (0, oo) 

where the outcome would be x if the patient reacts to a dosage of value x but not to 
any smaller dosage. 

Any subset E of the sample space is known as an event. That is, an event is a set consisting 
of possible outcomes of the experiment. If the outcome of the experiment is contained in 
E, then we say that E has occurred. Some examples of events are the following. 
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In Example 1 if E = {g}, then E is the event that the child is a girl. Similarly, if 
F = [b], then Fis the event that the child is a boy. 
In Example 2 if 

E = {all outcomes in S starting with a 3} 

then E is the event that the number 3 horse wins the race. 

For any two events E and Fo£ a sample space S, we define the new event EL) F, called 
the union of the events E and F, to consist of all outcomes that are either in E or in F or in 
both E and F. That is, the event E U F will occur if either E or F occurs. For instance, in 
Example 1 if E = {g} and F = {I?}, thenEUF = {g, b}. That is, EUF would be the 
whole sample space S. In Example 2 if E = {all outcomes starting with 6} is the event that 
the number 6 horse wins and F = {all outcomes having 6 in the second position} is the 
event that the number 6 horse comes in second, then E U F is the event that the number 
6 horse comes in either first or second. 

Similarly, for any two events E and F, we may also define the new event EF, called the 
intersection of E and F, to consist of all outcomes that are in both E and F. That is, the 
event EFwWX occur only if both E and F occur. For instance, in Example 3 if E = (0, 5) 
is the event that the required dosage is less than 5 and F = (2, 10) is the event that it is 
between 2 and 10, then EF = (2, 5) is the event that the required dosage is between 2 
and 5. In Example 2 if E = {all outcomes ending in 5} is the event that horse number 
5 comes in last and F = {all outcomes starting with 5} is the event that horse number 5 
comes in first, then the event EF does not contain any outcomes and hence cannot occur. 
To give such an event a name, we shall refer to it as the null event and denote it by 0. 
Thus refers to the event consisting of no outcomes. If EF = 0, implying that E and F 
cannot both occur, then E and F are said to be mutually exclusive. 

For any event E, we define the event E c , referred to as the complement of E, to consist 
of all outcomes in the sample space S that are not in E. That is, E c will occur if and only 
if E does not occur. In Example 1 if E = {/?} is the event that the child is a boy, then 
E c = { g} is the event that it is a girl. Also note that since the experiment must result in 
some outcome, it follows that S c = 0. 

For any two events E and F, if all of the outcomes in E are also in F, then we say that 
Eis contained in F and write E C F (or equivalently, F D E). Thus if E C F, then the 
occurrence of E necessarily implies the occurrence of F. If E C F andF C E, then we say 
that E and F are equal (or identical) and we write E = F . 

We can also define unions and intersections of more than two events. In particu- 
lar, the union of the events E\, Ei, . . . , E„, denoted either by E\ UE2 U • • • DE„ or by 
U"Ei, is defined to be the event consisting of all outcomes that are in Ei for at least one 
i — 1, 2, . . . , n. Similarly, the intersection of the events Ei, i — 1,2,..., n, denoted by 
E1E2 ■ ■ ■ E n , is defined to be the event consisting of those outcomes that are in all of the 
events Ei, i = 1,2, ... ,n. In other words, the union of the Ei occurs when at least one of 
the events Ei occurs; the intersection occurs when all of the events Ei occur. 
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3.3 VENN DIAGRAMS AND THE ALGEBRA OF EVENTS 

A graphical representation of events that is very useful for illustrating logical relations 
among them is the Venn diagram. The sample space S is represented as consisting of all 
the points in a large rectangle, and the events E, F, G, . . ., are represented as consisting of 
all the points in given circles within the rectangle. Events of interest can then be indicated 
by shading appropriate regions of the diagram. For instance, in the three Venn diagrams 
shown in Figure 3.1, the shaded areas represent respectively the events EUF, EF, and E c . 
The Venn diagram of Figure 3.2 indicates that E C F . 

The operations of forming unions, intersections, and complements of events obey 
certain rules not dissimilar to the rules of algebra. We list a few of these. 



Commutative law 
Associative law 
Distributive law 



EUF = FUE 
(EUF)UG = EU(FUG] 
(EUF)G = EGUFG 



EF=FE 

(EF)G = E(FG) 

EFUG = (EUG)(FUG) 



These relations are verified by showing that any outcome that is contained in the event on 
the left side of the equality is also contained in the event on the right side and vice versa. 
One way of showing this is by means of Venn diagrams. For instance, the distributive law 
may be verified by the sequence of diagrams shown in Figure 3.3. 






(a) Shaded region: EUF 
FIGURE 3.1 Venn diagrams. 



(b) Shaded region: EF 



(c) Shaded region: E° 




ECF 



FIGURE 3.2 Venn diagram. 



3.4 Axioms of Probability 
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(a) Shaded region: EG 





(b) Shaded region: FG (c) Shaded region: (EUF)G 

{EUF)G = EGUFG 



FIGURE 3.3 Proving the distributive law. 



The following useful relationship between the three basic operations of forming unions, 
intersections, and complements of events is known as DeMorgan's laws. 

(E U F) c = E c F c 
{EF) C = E c U F c 



3.4 AXIOMS OF PROBABILITY 

It appears to be an empirical fact that if an experiment is continually repeated under the 
exact same conditions, then for any event E, the proportion of time that the outcome is 
contained in E approaches some constant value as the number of repetitions increases. For 
instance, if a coin is continually flipped, then the proportion of flips resulting in heads will 
approach some value as the number of flips increases. It is this constant limiting frequency 
that we often have in mind when we speak of the probability of an event. 

From a purely mathematical viewpoint, we shall suppose that for each event E of an 
experiment having a sample space S there is a number, denoted by P(E ), that is in accord 
with the following three axioms. 
AXIOM 1 



AXIOM 2 



< P{E ) < 1 



P(S) = 1 



AXIOM 3 

For any sequence of mutually exclusive events E\,Ei, ■ ■ ■ (that is, events for which E{Ej = 

when i j^j), 



p (1)3 =E p (^' 

V=i / i=\ 
We call P{E ) the probability of the event E. 



1,2,. 



, oo 



Thus, Axiom 1 states that the probability that the outcome of the experiment is 
contained in E is some number between and 1. Axiom 2 states that, with probability 1, 
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the outcome will be a member of the sample space S. Axiom 3 states that for any set of 
mutually exclusive events the probability that at least one of these events occurs is equal to 
the sum of their respective probabilities. 

It should be noted that if we interpret P(E ) as the relative frequency of the event E 
when a large number of repetitions of the experiment are performed, then P{E ) would 
indeed satisfy the above axioms. For instance, the proportion (or frequency) of time that 
the outcome is in E is clearly between and 1, and the proportion of time that it is in S 
is 1 (since all outcomes are in S). Also, if E and ^have no outcomes in common, then 
the proportion of time that the outcome is in either E or F is the sum of their respective 
frequencies. As an illustration of this last statement, suppose the experiment consists of 
the rolling of a pair of dice and suppose that E is the event that the sum is 2, 3, or 12 and 
F is the event that the sum is 7 or 11. Then if outcome E occurs 1 1 percent of the time 
and outcome F 22 percent of the time, then 33 percent of the time the outcome will be 
either 2, 3, 12, 7, or 11. 

These axioms will now be used to prove two simple propositions concerning prob- 
abilities. We first note that E and E c are always mutually exclusive, and since EUE C = S, 
we have by Axioms 2 and 3 that 

1 = p(S) = P(E U E c ) = P{E ) + P(E C ) 

Or equivalently, we have the following: 

PROPOSITION 3.4.1 

P{E C ) = l-P{E) 

In other words, Proposition 3.4.1 states that the probability that an event does not occur 
is 1 minus the probability that it does occur. For instance, if the probability of obtaining 
a head on the toss of a coin is |, the probability of obtaining a tail must be |. 

Our second proposition gives the relationship between the probability of the union of 
two events in terms of the individual probabilities and the probability of the intersection. 

PROPOSITION 3.4.2 

P{EUF) = P{E)+P{F)-P{EF) 

Proof 

This proposition is most easily proven by the use of a Venn diagram as shown in Figure 
3.4. As the regions I, II, and III are mutually exclusive, it follows that 

P(EUF) = P(l) + POX) + P(lll) 
P(E)=P(l)+P(ll) 
P(F) =P(II)+P0ll) 



3.5 Sample Spaces Having Equally Likely Outcomes 
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S 


Es 








.F 




I 


(II) 


III 





FIGURE 3.4 



which shows that 

P{EUF)=P(E)+ P(F) - P{\\) 
and the proof is complete since II = EF. 

EXAMPLE 3.4a A total of 28 percent of American males smoke cigarettes, 7 percent smoke 
cigars, and 5 percent smoke both cigars and cigarettes. What percentage of males smoke 
neither cigars nor cigarettes? 

SOLUTION Let E be the event that a randomly chosen male is a cigarette smoker and let B 
be the event that he is a cigar smoker. Then, the probability this person is either a cigarette 
or a cigar smoker is 

P(EUF) =P{E)+P{F)-P{EF) = .07 + . 28-. 05 = .3 

Thus the probability that the person is not a smoker is .7, implying that 70 percent of 
American males smoke neither cigarettes nor cigars. I 

The odds of an event A is defined by 

P(A) P(A) 



P{A C ) 1 - P(A) 

Thus the odds of an event A tells how much more likely it is that A occurs than that it 
does not occur. For instance if P{A) = 3/4, then P{A)I{\ — P{A)) = 3, so the odds is 3. 
Consequently, it is 3 times as likely that A occurs as it is that it does not. 

3.5 SAMPLE SPACES HAYING EQUALLY 
LIKELY OUTCOMES 

For a large number of experiments, it is natural to assume that each point in the sample 
space is equally likely to occur. That is, for many experiments whose sample space 5 is a 
finite set, say S = {1, 2, ... , A^}, it is often natural to assume that 



/>({!}) =i>({2}) 



P{{N})=p (say) 
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Now it follows from Axioms 2 and 3 that 

l=P(S)= P({1}) +■■■ +P({N}) = Np 

which shows that 

P{{i})=p= UN 
From this it follows from Axiom 3 that for any event E, 

Number of points in E 



P{E] 



N 



In words, if we assume that each outcome of an experiment is equally likely to occur, then 
the probability of any event E equals the proportion of points in the sample space that are 
contained in E. 

Thus, to compute probabilities it is often necessary to be able to effectively count the 
number of different ways that a given event can occur. To do this, we will make use of the 
following rule. 

BASIC PRINCIPLE OF COUNTING 

Suppose that two experiments are to be performed. Then if experiment 1 can result in 
any one of m possible outcomes and if, for each outcome of experiment 1, there are n 
possible outcomes of experiment 2, then together there are mn possible outcomes of the 
two experiments. 

Proof of the Basic Principle 

The basic principle can be proven by enumerating all the possible outcomes of the two 
experiments as follows: 

(1,1), (1,2) (1,») 

(2,1), (2,2),...,(2,») 



(m, 1), (m,2),. . . , (m, n) 

where we say that the outcome is (i,j) if experiment 1 results in its z'th possible outcome 
and experiment 2 then results in the jth of its possible outcomes. Hence, the set of 
possible outcomes consists of m rows, each row containing n elements, which proves the 
result. ■ 

EXAMPLE 3.5a Two balls are "randomly drawn" from a bowl containing 6 white and 5 
black balls. What is the probability that one of the drawn balls is white and the other black? 
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SOLUTION If we regard the order in which the balls are selected as being significant, then 
as the first drawn ball may be any of the 11 and the second any of the remaining 10, it 
follows that the sample space consists of 11-10 = 110 points. Furthermore, there are 
6 • 5 = 30 ways in which the first ball selected is white and the second black, and similarly 
there are 5 • 6 = 30 ways in which the first ball is black and the second white. Hence, 
assuming that "randomly drawn" means that each of the 110 points in the sample space is 
equally likely to occur, then we see that the desired probability is 

30 + 30 _ 6 
110 ~ IT 

When there are more than two experiments to be performed the basic principle can be 
generalized as follows: 



Generalized Basic Principle of Counting 

If r experiments that are to be performed are such that the first one may 
result in any of n\ possible outcomes, and if for each of these n\ possible 
outcomes there are «2 possible outcomes of the second experiment, and 
if for each of the possible outcomes of the first two experiments there are 
«3 possible outcomes of the third experiment, and if, ... , then there are 
a total of n\ ■ ni ■ ■ ■ n r possible outcomes of the r experiments. 



As an illustration of this, let us determine the number of different ways n distinct objects 
can be arranged in a linear order. For instance, how many different ordered arrangements 
of the letters a, b, c are possible? By direct enumeration we see that there are 6; namely, abc, 
acb, bac, bca, cab, cba. Each one of these ordered arrangements is known as a permutation. 
Thus, there are 6 possible permutations of a set of 3 objects. This result could also have 
been obtained from the basic principle, since the first object in the permutation can be 
any of the 3, the second object in the permutation can then be chosen from any of the 
remaining 2, and the third object in the permutation is then chosen from the remaining 
one. Thus, there are 3 • 2 ■ 1 =6 possible permutations. 

Suppose now that we have n objects. Similar reasoning shows that there are 

»(»-l)(»-2) ••-3-2-1 

different permutations of the n objects. It is convenient to introduce the notation n\, which 
is read "« factorial," for the foregoing expression. That is, 

n\ = n(n — \){n — 2) • • • 3 • 2 • 1 
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Thus, for instance, 1! = 1, 2! = 2 • 1 = 2, 3! = 3 • 2 • 1 = 6, 4! = 4 • 3 • 2 • 1 = 24, and 
so on. It is convenient to define 0! = 1. 

EXAMPLE 3.5b Mr. Jones has 10 books that he is going to put on his bookshelf. Of these, 
4 are mathematics books, 3 are chemistry books, 2 are history books, and 1 is a language 
book. Jones wants to arrange his books so that all the books dealing with the same subject 
are together on the shelf. How many different arrangements are possible? 

SOLUTION There are 4! 3! 2! 1! arrangements such that the mathematics books are first 
in line, then the chemistry books, then the history books, and then the language book. 
Similarly, for each possible ordering of the subjects, there are 4! 3! 2! 1! possible arrange- 
ments. Hence, as there are 4! possible orderings of the subjects, the desired answer is 
4! 4! 3! 2! 1! = 6,912. ■ 

EXAMPLE 3.5c A class in probability theory consists of 6 men and 4 women. An exam is 
given and the students are ranked according to their performance. Assuming that no two 
students obtain the same score, (a) how many different rankings are possible? (b) If all 
rankings are considered equally likely, what is the probability that women receive the top 
4 scores? 

SOLUTION 

(a) Because each ranking corresponds to a particular ordered arrangement of the 10 
people, we see the answer to this part is 10! = 3,628,800. 

(b) Because there are 4! possible rankings of the women among themselves and 6! 
possible rankings of the men among themselves, it follows from the basic principle 
that there are (6!) (4!) = (720) (24) = 17,280 possible rankings in which the women 
receive the top 4 scores. Hence, the desired probability is 

6!4! 4- 3 -2-1 1 



10! 10-9-8-7 210 

Suppose now that we are interested in determining the number of different groups of 
r objects that could be formed from a total of n objects. For instance, how many different 
groups of three could be selected from the five items A, B, C, D, El To answer this, reason 
as follows. Since there are 5 ways to select the initial item, 4 ways to then select the next 
item, and 3 ways to then select the final item, there are thus 5-4-3 ways of selecting the 
group of 3 when the order in which the items are selected is relevant. However, since every 
group of 3, say the group consisting of items A, B, and C, will be counted 6 times (that 
is, all of the permutations ABC, ACB, BAC, BCA, CAB, CBA will be counted when the 
order of selection is relevant), it follows that the total number of different groups that can 
be formed is (5 • 4 • 3)/(3 • 2 • 1) = 10. 

In general, as n(n — 1) •••(«— r + 1) represents the number of different ways that a 
group of r items could be selected from n items when the order of selection is considered 
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relevant (since the first one selected can be any one of the n, and the second selected any 
one of the remaining » — 1, etc.), and since each group of r items will be counted r! times 
in this count, it follows that the number of different groups of r items that could be formed 
from a set of n items is 

n(n — 1) — (n — r + 1) n\ 



(n — r)\r\ 



NOTATION AND TERMINOLOGY 

We define ("), for r < n, by 



r ) [n — r)\r\ 



and call (") the number of combinations of n objects taken r at a time. 

Thus (") represents the number of different groups of size r that can be selected from a 
set of size n when the order of selection is not considered relevant. For example, there are 

8-7 

=28 

2- 1 

different groups of size 2 that can be chosen from a set of 8 people, and 

'10\ 10-9 

= 45 



2- 1 



different groups of size 2 that can be chosen from a set of 10 people. Also, since 0! = 1, 
note that 

EXAMPLE 3.5d A committee of size 5 is to be selected from a group of 6 men and 9 women. 
If the selection is made randomly, what is the probability that the committee consists of 
3 men and 2 women? 

SOLUTION Let us assume that "randomly selected" means that each of the ( * ) possible 
combinations is equally likely to be selected. Hence, since there are (,) possible choices 
of 3 men and ( 2 ) possible choices of 2 women, it follows that the desired probability is 
given by 

3 \2 240 



15\ 1001 
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EXAMPLE 3.5e From a set of n items a random sample of size k is to be selected. What is 
the probability a given item will be among the k selected? 

SOLUTION The number of different selections that contain the given item is (i)u_i)- 
Hence, the probability that a particular item is among the k selected is 



(n- 1\ Un\ _ («-!)! / 

\k-l)/\k) (n-k)\(k-l)\/ 



(n-k)\(k-l)\/ {n-k)\k\ n 

EXAMPLE 3.5f A basketball team consists of 6 black and 6 white players. The players are to 
be paired in groups of two for the purpose of determining roommates. If the pairings are 
done at random, what is the probability that none of the black players will have a white 
roommate? 

SOLUTION Let us start by imagining that the 6 pairs are numbered — that is, there is a 
first pair, a second pair, and so on. Since there are ( 2 ) different choices of a first pair; and 
for each choice of a first pair there are ( 2 ) different choices of a second pair; and for each 
choice of the first 2 pairs there are ( 2 ) choices for a third pair; and so on, it follows from 
the generalized basic principle of counting that there are 

12\ /10\ /8\ /6\ /4\ /2\ 12! 



2j\2j\2j \2J \2J \2J (2!) 6 

ways of dividing the players into a first pair, a second pair, and so on. Hence there are 
(12)!/2 6! ways of dividing the players into 6 (unordered) pairs of 2 each. Furthermore, 
since there are, by the same reasoning, 6!/2 3 3! ways of pairing the white players among 
themselves and 6!/2 3! ways of pairing the black players among themselves, it follows that 
there are (6!/2 3 3!) pairings that do not result in any black-white roommate pairs. Hence, 
if the pairings are done at random (so that all outcomes are equally likely), then the desired 
probability is 

6! \ 2 /(12)! 5 

* ' .0216 



r)7- 



s 2 3 3!/ / 2 6 6! 231 

Hence, there are roughly only two chances in a hundred that a random pairing will not 
result in any of the white and black players rooming together. ■ 

EXAMPLE 3.5g If n people are present in a room, what is the probability that no two of 
them celebrate their birthday on the same day of the year? How large need n be so that 
this probability is less than i? 

SOLUTION Because each person can celebrate his or her birthday on any one of 365 days, 
there are a total of (365)" possible outcomes. (We are ignoring the possibility of someone 
having been born on February 29.) Furthermore, there are (365) (364) (363) -(365 — w+1) 
possible outcomes that result in no two of the people having the same birthday. This is so 
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because the first person could have any one of 365 birthdays, the next person any of the 
remaining 364 days, the next any of the remaining 363, and so on. Hence, assuming that 
each outcome is equally likely, we see that the desired probability is 

(365) (364) (363) • • • (365 - n + 1) 



(365)' 



It is a rather surprising fact that when n > 23, this probability is less than -• That is, if 
there are 23 or more people in a room, then the probability that at least two of them have 
the same birthday exceeds ~ • Many people are initially surprised by this result, since 23 
seems so small in relation to 365, the number of days of the year. However, every pair of 
individuals has probability ,,,,, 2 = ~r^ of having the same birthday, and in a group of 

23 people there are ( ~ ) = 253 different pairs of individuals. Looked at this way, the result 
no longer seems so surprising. ■ 

3.6 CONDITIONAL PROBABILITY 

In this section, we introduce one of the most important concepts in all of probability 
theory — that of conditional probability. Its importance is twofold. In the first place, we 
are often interested in calculating probabilities when some partial information concerning 
the result of the experiment is available, or in recalculating them in light of additional 
information. In such situations, the desired probabilities are conditional ones. Second, as 
a kind of a bonus, it often turns out that the easiest way to compute the probability of an 
event is to first "condition" on the occurrence or nonoccurrence of a secondary event. 

As an illustration of a conditional probability, suppose that one rolls a pair of dice. The 
sample space 5 of this experiment can be taken to be the following set of 36 outcomes 

S={(i,j), i= 1,2,3,4,5,6, ;= 1,2,3,4,5,6} 

where we say that the outcome is (i,j) if the first die lands on side i and the second on 
sidey. Suppose now that each of the 36 possible outcomes is equally likely to occur and 
thus has probability gg. (In such a situation we say that the dice are fair.) Suppose further 
that we observe that the first die lands on side 3. Then, given this information, what is the 
probability that the sum of the two dice equals 8? To calculate this probability, we reason 
as follows: Given that the initial die is a 3, there can be at most 6 possible outcomes of our 
experiment, namely, (3, 1), (3, 2), (3, 3), (3, 4), (3, 5), and (3, 6). In addition, because 
each of these outcomes originally had the same probability of occurring, they should still 
have equal probabilities. That is, given that the first die is a 3, then the (conditional) 
probability of each of the outcomes (3, 1), (3, 2), (3, 3), (3, 4), (3, 5), (3, 6) is \, whereas 
the (conditional) probability of the other 30 points in the sample space is 0. Hence, the 
desired probability will be i. 

If we let E and F denote, respectively, the event that the sum of the dice is 8 and the 
event that the first die is a 3, then the probability just obtained is called the conditional 
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FIGURE 3.5 P{E\F) 



P(EF) 
P(F) ■ 



probability of E given that ^has occurred, and is denoted by 

P(E\F) 

A general formula for P{E\F) that is valid for all events E and F is derived in the same 
manner as just described. Namely, if the event F occurs, then in order for E to occur it is 
necessary that the actual occurrence be a point in both E and F; that is, it must be in EF. 
Now, since we know that F has occurred, it follows that F becomes our new (reduced) 
sample space and hence the probability that the event EF occurs will equal the probability 
of EF relative to the probability of F. That is, 



P(E\F) = 



P{EF) 

W) 



(3.6.1) 



Note that Equation 3.6. 1 is well defined only when P(F) > and hence P(E\F) is defined 
only when P{F) > 0. (See Figure 3.5.) 

The definition of conditional probability given by Equation 3.6. 1 is consistent with the 
interpretation of probability as being a long-run relative frequency. To see this, suppose 
that a large number n of repetitions of the experiment are performed. Then, since P(F) 
is the long-run proportion of experiments in which ^occurs, it follows that ^will occur 
approximately nP{F) times. Similarly, in approximately nP(EF) of these experiments, 
both E and ^will occur. Hence, of the approximately nP(F) experiments whose outcome 
is in F, approximately nP{EF) of them will also have their outcome in E. That is, for 
those experiments whose outcome is in F, the proportion whose outcome is also in E is 
approximately 

nP(EF) _ P{EF) 
nP(F) P{F) 

Since this approximation becomes exact as n becomes larger and larger, it follows that 
(3.6.1) gives the appropriate definition of the conditional probability of E given that ^has 
occurred. 

EXAMPLE 3.6a A bin contains 5 defective (that immediately fail when put in use), 10 
partially defective (that fail after a couple of hours of use), and 25 acceptable transistors. 
A transistor is chosen at random from the bin and put into use. If it does not immediately 
fail, what is the probability it is acceptable? 
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SOLUTION Since the transistor did not immediately fail, we know that it is not one of the 
5 defectives and so the desired probability is: 

/"{acceptable | not defective} 

/"{acceptable, not defective} 
/{not defective} 
/{acceptable} 



P {not defective} 



where the last equality follows since the transistor will be both acceptable and not defective 
if it is acceptable. Hence, assuming that each of the 40 transistors is equally likely to be 
chosen, we obtain that 

/{acceptable I not defective} = — = 5/7 

F 35/40 

It should be noted that we could also have derived this probability by working directly 
with the reduced sample space. That is, since we know that the chosen transistor is not 
defective, the problem reduces to computing the probability that a transistor, chosen 
at random from a bin containing 25 acceptable and 10 partially defective transistors, is 
acceptable. This is clearly equal to -^ . ■ 

EXAMPLE 3.6b The organization that Jones works for is running a father— son dinner for 
those employees having at least one son. Each of these employees is invited to attend along 
with his youngest son. If Jones is known to have two children, what is the conditional 
probability that they are both boys given that he is invited to the dinner? Assume that the 
sample space S is given by S = {(b, b), (b,g), (g, b), (g,g)} and all outcomes are equally 
likely [{b,g) means, for instance, that the younger child is a boy and the older child is 
a girl]. 

SOLUTION The knowledge that Jones has been invited to the dinner is equivalent to know- 
ing that he has at least one son. Hence, letting B denote the event that both children are 
boys, and^4 the event that at least one of them is a boy, we have that the desired probability 
P{B\A) is given by 

P(BA) 
P(B\A) = — — - 
1 P(A) 

P({(b,b)}) 



P({(b,b),(b,g),(g,b)}) 

l 



I 3 

4 D 



Many readers incorrectly reason that the conditional probability of two boys given at least 
one is i , as opposed to the correct i, since they reason that the Jones child not attending 
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the dinner is equally likely to be a boy or a girl. Their mistake, however, is in assuming that 
these two possibilities are equally likely. Remember that initially there were four equally 
likely outcomes. Now the information that at least one child is a boy is equivalent to 
knowing that the outcome is not (g,g). Hence we are left with the three equally likely 
outcomes (b, b), (b,g), (g, b), thus showing that the Jones child not attending the dinner 
is twice as likely to be a girl as a boy. ■ 

By multiplying both sides of Equation 3.6.1 by P{F) we obtain that 

P(EF) = P{F)P(E\F) (3.6.2) 

In words, Equation 3.6.2 states that the probability that both E and F occur is equal to 
the probability that F occurs multiplied by the conditional probability of E given that 
F occurred. Equation 3.6.2 is often quite useful in computing the probability of the 
intersection of events. This is illustrated by the following example. 

EXAMPLE 3.6c Ms. Perez figures that there is a 30 percent chance that her company will 
set up a branch office in Phoenix. If it does, she is 60 percent certain that she will be 
made manager of this new operation. What is the probability that Perez will be a Phoenix 
branch office manager? 

SOLUTION If we let B denote the event that the company sets up a branch office in Phoenix 
and M the event that Perez is made the Phoenix manager, then the desired probability is 
P(BM), which is obtained as follows: 

P{BM) = P(B)P(M\B) 
= (-3X.6) 
= .18 

Hence, there is an 18 percent chance that Perez will be the Phoenix manager. ■ 

3.7 BAYES' FORMULA 

Let E and F be events. We may express E as 

E = EFUEF C 

for, in order for a point to be in E, it must either be in both E and -For be in E but not in F. 
(See Figure 3.6.) As iiFand EF C are clearly mutually exclusive, we have by Axiom 3 that 

P{E) =P{EF)+P{EF C ) 

= P{E\F)P{F)+P{E\F C )P{F C ) 

= P(E\F)P(F)+P(E\F c )[l-P(F)] (3.7.1) 
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FIGURE 3.6 E = EFU EF C . 



Equation 3.7.1 states that the probability of the event E is a weighted average of the 
conditional probability of E given that Fhzs occurred and the conditional probability of 
E given that F has not occurred: Each conditional probability is given as much weight as 
the event it is conditioned on has of occurring. It is an extremely useful formula, for its 
use often enables us to determine the probability of an event by first "conditioning" on 
whether or not some second event has occurred. That is, there are many instances where 
it is difficult to compute the probability of an event directly, but it is straightforward to 
compute it once we know whether or not some second event has occurred. 

EXAMPLE 3.7a An insurance company believes that people can be divided into two 
classes — those that are accident prone and those that are not. Their statistics show 
that an accident-prone person will have an accident at some time within a fixed 1-year 
period with probability .4, whereas this probability decreases to .2 for a non-accident-prone 
person. If we assume that 30 percent of the population is accident prone, what is the prob- 
ability that a new policy holder will have an accident within a year of purchasing a policy? 

SOLUTION We obtain the desired probability by first conditioning on whether or not the 
policy holder is accident prone. Let A\ denote the event that the policy holder will have 
an accident within a year of purchase; and let A denote the event that the policy holder is 
accident prone. Hence, the desired probability, P(A\), is given by 

PiAx) = P(A l \A)P(A)+P(A l \A c )P(A c ) 
= (.4)(.3) + (.2)(.7) = .26 ■ 



In the next series of examples, we will indicate how to reevaluate an initial probability 
assessment in the light of additional (or new) information. That is, we will show how to 
incorporate new information with an initial probability assessment to obtain an updated 
probability. 

EXAMPLE 3.7b Reconsider Example 3.7a and suppose that a new policy holder has an 
accident within a year of purchasing his policy. What is the probability that he is accident 
prone? 

SOLUTION Initially, at the moment when the policy holder purchased his policy, we 
assumed there was a 30 percent chance that he was accident prone. That is, P(A) = .3. 
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However, based on the fact that he has had an accident within a year, we now reevaluate 
his probability of being accident prone as follows. 



P(A\A{) = 



P(AA 



P(Ai) 

P(A)P(A 1 \A) 
P(Ai) 

(■3)(.4) = 6 
.26 13 



.4615 



EXAMPLE 3.7c In answering a question on a multiple-choice test, a student either knows 
the answer or she guesses. Letp be the probability that she knows the answer and 1 — p 
the probability that she guesses. Assume that a student who guesses at the answer will be 
correct with probability l/m, where m is the number of multiple-choice alternatives. What 
is the conditional probability that a student knew the answer to a question given that she 
answered it correctly? 

SOLUTION Let C and K denote, respectively, the events that the student answers the 
question correctly and the event that she actually knows the answer. To compute 

P(KC) 

P{K\C) = — 

1 P(C) 

we first note that 

P{KC) = P{K)P{C\K) 
= p-\ 

= P 

To compute the probability that the student answers correctly, we condition on whether 
or not she knows the answer. That is, 

P(Q = P{C\K)P{K)+P{C\K C )P{K C ) 
= p + (llm)(l-p) 

Hence, the desired probability is given by 

p mp 



P{K\C) 



p+(l/m)(l-p) \ + {m-\)p 



Thus, for example, if m = 5,p = j> then the probability that a student knew the answer 
to a question she correctly answered is I. ■ 



3.7 Bayes' Formula 73 



EXAMPLE 3.7d A laboratory blood test is 99 percent effective in detecting a certain disease 
when it is, in fact, present. However, the test also yields a "false positive" result for 
1 percent of the healthy persons tested. (That is, if a healthy person is tested, then, with 
probability .01, the test result will imply he or she has the disease.) If .5 percent of the 
population actually has the disease, what is the probability a person has the disease given 
that his test result is positive? 

SOLUTION Let D be the event that the tested person has the disease and E the event that 
his test result is positive. The desired probability P(D\E ) is obtained by 



P(D\E) 



P(DE) 

HE) 

P(E\D)P(D) 
~ P{E\D)P{D) + P(E\D C )P(D C ) 
_ (.99)(.005) 

~ (.99)(.005) + (.01)(.995) 
= .3322 

Thus, only 33 percent of those persons whose test results are positive actually have the 
disease. Since many students are often surprised at this result (because they expected this 
figure to be much higher since the blood test seems to be a good one), it is probably 
worthwhile to present a second argument which, though less rigorous than the foregoing, 
is probably more revealing. We now do so. 

Since .5 percent of the population actually has the disease, it follows that, on the average, 
1 person out of every 200 tested will have it. The test will correctly confirm that this person 
has the disease with probability .99. Thus, on the average, out of every 200 persons tested, 
the test will correctly confirm that .99 person has the disease. On the other hand, out of 
the (on the average) 199 healthy people, the test will incorrectly state that (199) (.01) of 
these people have the disease. Hence, for every .99 diseased person that the test correctly 
states is ill, there are (on the average) 1.99 healthy persons that the test incorrectly states 
are ill. Hence, the proportion of time that the test result is correct when it states that 
a person is ill is 

.99 



.99+1.99 



.3322 



Equation 3.7.1 is also useful when one has to reassess one's (personal) probabilities in 
the light of additional information. For instance, consider the following examples. 

EXAMPLE 3.7e At a certain stage of a criminal investigation, the inspector in charge is 
60 percent convinced of the guilt of a certain suspect. Suppose now that a new piece of 
evidence that shows that the criminal has a certain characteristic (such as left-handedness, 
baldness, brown hair, etc.) is uncovered. If 20 percent of the population possesses this 
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characteristic, how certain of the guilt of the suspect should the inspector now be if it turns 
out that the suspect is among this group? 

SOLUTION Letting G denote the event that the suspect is guilty and C the event that he 
possesses the characteristic of the criminal, we have 

P(GC) 
P{G\C) = — — -i 
P(C) 

Now 

P(GC) = P(G)P(C\G) 
= (-6)(1) 
= .6 

To compute the probability that the suspect has the characteristic, we condition on whether 
or not he is guilty. That is, 

P{C)=P{C\G)P{G)+P{C\G C )P{G C ) 
= (l)(.6) + (.2)(.4) 
= .68 

where we have supposed that the probability of the suspect having the characteristic if 
he is, in fact, innocent is equal to .2, the proportion of the population possessing the 
characteristic. Hence 

60 
P(G\C) = — = .882 
68 

and so the inspector should now be 88 percent certain of the guilt of the suspect. ■ 

EXAMPLE 3.7e (continued) Let us now suppose that the new evidence is subject to different 
possible interpretations, and in fact only shows that it is 90 percent likely that the criminal 
possesses this certain characteristic. In this case, how likely would it be that the suspect is 
guilty (assuming, as before, that he has this characteristic)? 

SOLUTION In this case, the situation is as before with the exception that the probability 
of the suspect having the characteristic given that he is guilty is now .9 (rather than 1). 
Hence, 

P(GC) 
P(G|C) = — -^ 
P(C) 

P{G)P{C\G) 



P(C\G)P(G) + P{C\G C )P{G C ) 
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(■6) (.9) 
(.9)(.6) + (.2)(.4) 

54 

which is slightly less than in the previous case (why?). ■ 

Equation3.7.1 may be generalized in the following manner. Suppose that Fi,Fz, ■ ■ ■ ,F n 
are mutually exclusive events such that 

n 

i=\ 
In other words, exactly one of the events F\,Fi, • ■ • >F n must occur. By writing 

n 

F = \jFF t 
«=l 

and using the fact that the events EFj, i = 1, . . . , n are mutually exclusive, we obtain that 



/>(£) = £>(£fv) 

z'=l 
« 

= J>(JF|/frPCF;) (3.7.2) 



Thus, Equation 3.7.2 shows how, for given events F\,F2, ■ ■ ■ , F„ of which one and only 
one must occur, we can compute P(E ) by first "conditioning" on which one of the Fi 
occurs. That is, it states that P{E) is equal to a weighted average of P{E\Fi), each term 
being weighted by the probability of the event on which it is conditioned. 

Suppose now that E has occurred and we are interested in determining which one of 
Fj also occurred. By Equation 3.7.2, we have that 

P(EFj) 
P(Fj\E)- 



P{E) 
P(E\Fj)P(F } ) 

£, P{E\FdP{Fi) 



(3.7.3) 



Equation 3.7.3 is known as Bayes' formula, after the English philosopher Thomas Bayes. If 
we think of the events Fj as being possible "hypotheses" about some subject matter, then 
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Bayes' formula may be interpreted as showing us how opinions about these hypotheses 
held before the experiment [that is, the P(Fj)] should be modified by the evidence of the 
experiment. 

EXAMPLE 3.7f A plane is missing and it is presumed that it was equally likely to have 
gone down in any of three possible regions. Let 1 — a; denote the probability the plane 
will be found upon a search of the 2th region when the plane is, in fact, in that region, 
i = 1,2,3. (The constants otj are called overlook probabilities because they represent the 
probability of overlooking the plane; they are generally attributable to the geographical 
and environmental conditions of the regions.) What is the conditional probability that the 
plane is in the 2 th region, given that a search of region 1 is unsuccessful, i = 1, 2, 3? 

SOLUTION Let R}, i = 1, 2, 3, be the event that the plane is in region i; and let E be the 
event that a search of region 1 is unsuccessful. From Bayes' formula, we obtain 



P(R l \E) = 



P(E) 
P{E\Ri)P{Ri) 

£ P{E\Ri)P{Ri, 

i=\ 



(«i)(l/3) 



(a!)(l/3) + (1X1/3) + (l)(l/3) 



u\ +2 



Forj = 2, 3, 

P(Rj\E) 



PjElfyPjRj) 
P(E) 
(D(l/3) 
(ai)l/3+l/3 + l/3 

— Zv > = 2 ' 3 
ai +2 



Thus, for instance, if a,\ = .4, then the conditional probability that the plane is in region 
1 given that a search of that region did not uncover it is g. ■ 

3.8 INDEPENDENT EVENTS 

The previous examples in this chapter show that P(E\F), the conditional probability of 
E given F, is not generally equal to P(E), the unconditional probability of E. In other 
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words, knowing that /'has occurred generally changes the chances of /"s occurrence. In 
the special cases where P{E\F) does in fact equal P{E), we say that E is independent 
of F. That is, E is independent of F if knowledge that F has occurred does not change 
the probability that E occurs. 

Since P(E\F) = P(EF)/P(F), we see that E is independent ofFlf 

P{EF)=P(E)P{F) (3.8.1) 

Since this equation is symmetric in E and F, it shows that whenever E is independent of 
F so is F of E. We thus have the following. 

Definition 

Two events E and F are said to be independent if Equation 3.8.1 holds. Two events E 
and F that are not independent are said to be dependent. 

EXAMPLE 3.8a A card is selected at random from an ordinary deck of 52 playing cards. If 
A is the event that the selected card is an ace and H is the event that it is a heart, then A 
and //are independent, since P(AH) = ±, while P{A) = ^ and P{H) = ±§. ■ 

EXAMPLE 3.8b If we let /'denote the event that the next president is a Republican and 
/'the event that there will be a major earthquake within the next year, then most people 
would probably be willing to assume that E and F are independent. However, there would 
probably be some controversy over whether it is reasonable to assume that E is independent 
of G, where G is the event that there will be a recession within the next two years. I 

We now show that if /i is independent of /'then Els also independent o£ F c . 
PROPOSITION 3.8.1 If /'and /'are independent, then so are E and F c . 

Proof 

Assume that E and F are independent. Since E = EF U EF C , and EF and EF c are obvi- 
ously mutually exclusive, we have that 

P(E) =P(EF)+P{EF C ) 

= P(E )/■(/■) + P(EF C ) by the independence of Z 1 and F 

or equivalently, 

P{EF C )= P(E )(1 -P{F)) 
= P{E)P{F C ) 

and the result is proven. □ 
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Thus if Eis independent of F, then the probability of E's occurrence is unchanged by 
information as to whether or not ^has occurred. 

Suppose now that E is independent of F and is also independent of G. Is E then 
necessarily independent of EG ? The answer, somewhat surprisingly, is no. Consider the 
following example. 

EXAMPLE 3.8c Two fair dice are thrown. Let Ey denote the event that the sum of the dice 
is 7. Let F denote the event that the first die equals 4 and let T be the event that the 
second die equals 3. Now it can be shown (see Problem 36) that Ey is independent of 
F and that Ey is also independent of T; but clearly Ey is not independent of FT [since 
P{E 7 \FT) = 1]. ■ 

It would appear to follow from the foregoing example that an appropriate definition 
of the independence of three events E, F, and G would have to go further than merely 
assuming that all of the ( 2 ) pairs of events are independent. We are thus led to the following 
definition. 

Definition 

The three events E, F, and G are said to be independent if 

P{EFG) = P{E)P(F)P{G) 
P{EF) =P{E)P{F) 
P{EG)=P{E)P(G) 
P(FG) = P{F)P{G) 

It should be noted that if the events E, F, G are independent, then E will be independent 
of any event formed from ^and G. For instance, Els independent of F U G since 

P{E{F U G )) = P{EF U EG) 

= P(EF) + P(EG) - P(EFG) 

= P{E)P{F) + P{E )P{G) - P{E )P{FG) 

= P(E)[P(F)+P(G)-P(FG)] 

= P(E)P(F U G) 

Of course we may also extend the definition of independence to more than three 
events. The events E\,Ei,...,E n are said to be independent if for every subset 
Ey, Ej;, . . . ,E/, r < n, of these events 

P(E V E 2 > ■ ■ ■ E r ,) = P(E V )P(E 2 >) ■ ■ ■ P(E r ,) 
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FIGURE 3.7 Parallel system: functions if current flows from A to B. 



It is sometimes the case that the probability experiment under consideration consists of 
performing a sequence of subexperiments. For instance, if the experiment consists of 
continually tossing a coin, then we may think of each toss as being a subexperiment. In 
many cases it is reasonable to assume that the outcomes of any group of the subexperiments 
have no effect on the probabilities of the outcomes of the other subexperiments. If such is 
the case, then we say that the subexperiments are independent. 

EXAMPLE 3.8d A system composed of n separate components is said to be a parallel system 
if it functions when at least one of the components functions. (See Figure 3.7.) For such 
a system, if component i, independent of other components, functions with probability 
pi,i = 1, . . . , n, what is the probability the system functions? 

SOLUTION Let A} denote the event that component i functions. Then 

fjsystem functions} = 1 — fjsystem does not function} 

= 1 — P{v\\ components do not function} 

= l-P{A[A c 2 ---A< : n ) 



\ 1(1 — Pi) by independence 



EXAMPLE 3.8e A set of k coupons, each of which is independently a type_/' coupon with 
probability pj, X)/=i Pj = 1> ' s collected. Find the probability that the set contains 
a typey coupon given that it contains a type i, i ^ j. 

SOLUTION Let A r be the event that the set contains a type r coupon. Then 

P(4iAi) 

P{A,\Ai) = J 
J P(Ad 

To compute P{Aj) and P(AjAj), consider the probability of their complements: 

P(4i) = i - P(Ai) 

= 1 — P{no coupon is type z'} 
= 1 - (1 ~p,) k 
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P(A,Aj) = l-P(A c { \JAp 

= l-[P{A c i )+P{A<)-P{A c l A c j )\ 

= 1 — (1 — pi) — (1 — pi) + P{no coupon is type i or typej'} 

= 1 - (1 -pit ~ (1 ~pj) k + (1 -pi ~pj) k 

where the final equality follows because each of the k coupons is, independently, neither 
of type i or of type j with probability I — pi — pj. Consequently, 

1 - (1 - Pit - (1 - Pit + (1 " Pi ~ Pit 

P(AAAi) = — , ■ 

J I- (I -Pit 



Problems 

1. A box contains three marbles — one red, one green, and one blue. Consider an 
experiment that consists of taking one marble from the box, then replacing it in 
the box and drawing a second marble from the box. Describe the sample space. 
Repeat for the case in which the second marble is drawn without first replacing 
the first marble. 

2. An experiment consists of tossing a coin three times. What is the sample space 
of this experiment? Which event corresponds to the experiment resulting in more 
heads than tails? 

3. Let S = { 1, 2, 3, 4, 5, 6, 7}, E = { 1, 3, 5, 7}, F = {7, 4, 6}, G = {1, 4}. Find 

(a) EF; (c) EG C ; (e) E C {F U G ); 

(b) E U EG; (d) EF C U G; (f ) ^G U EG 

4. Two dice are thrown. Let E be the event that the sum of the dice is odd, let F be 
the event that the first die lands on 1, and let G be the event that the sum is 5. 
Describe the events EF, EU E, EG, EE C , EEC 

5. A system is composed of four components, each of which is either working or failed. 
Consider an experiment that consists of observing the status of each component, 
and let the outcome of the experiment be given by the vector (x\,X2, x$, X4) where 
Xj is equal to 1 if component i is working and is equal to if component i is failed. 

(a) How many outcomes are in the sample space of this experiment? 

(b) Suppose that the system will work if components 1 and 2 are both working, 
or if components 3 and 4 are both working. Specify all the outcomes in the 
event that the system works. 

(c) Let E be the event that components 1 and 3 are both failed. How many 
outcomes are contained in event El 
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6. Let E, F, Gbe three events. Find expressions for the events that of E, F, G 

(a) only E occurs; 

(b) both E and G but not F occur; 

(c) at least one of the events occurs; 

(d) at least two of the events occur; 

(e) all three occur; 

(f ) none of the events occurs; 

(g) at most one of them occurs; 
(h) at most two of them occur; 
(i) exactly two of them occur; 
(j) at most three of them occur. 

7. Find simple expressions for the events 

(a) EUE C ; 

(b) EE C ; 

(c) (E \J F)(E U F c ); 

(d) (E U F){E C U F)E U F c ); 

(e) {EVJF){F\JG). 

8. Use Venn diagrams (or any other method) to show that 

(a) EFcE,E cEU F; 

(b) ifE cFthenF c CE C ; 

(c) the commutative laws are valid; 

(d) the associative laws are valid; 

(e) F = FEUFE C ; 

(f) EUF = EUE C F; 

(g) DeMorgan's laws are valid. 

9. For the following Venn diagram, describe in terms of E, F, and G the events 
denoted in the diagram by the Roman numerals I through VII. 



E^ 


/H 




.F 




I ( 


] in 






/\ivy 








/ vir>/ 


v \ 






VI 




\G 
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10. Show that if E cF then P(E) < P(E). (Hint: Write F as the union of two 
mutually exclusive events, one of them being E.) 

1 1. Prove Boole's inequality, namely that 

(n \ n 

\jEA<J2m) 

12. If P{E) = .9 and P(F) = .9, show that P{EF) > .8. In general, prove 
Bonferroni's inequality, namely that 



P{EF)>P(E)+P(F)-1 



13. Prove that 



(a) P(EF C )=P(E)-P(EF) 

(b) P(E C F C ) = 1-P(E)- P(F) + P(EF) 

14. Show that the probability that exactly one of the events E or F occurs is equal to 
P{E)+P{F)-2P{EF). 

15. Calculate ©,Q, ©,©,(-). 

16. Show that 



Now present a combinatorial argument for the foregoing by explaining why a 
choice of r items from a set of size n is equivalent to a choice of n — r items from 
that set. 

17. Show that 

'n\ (n — 1 



r ) \r — 1 ' 

For a combinatorial argument, consider a set of n items and fix attention on one 
of these items. How many different sets of size r contain this item, and how many 
do not? 

18. A group of 5 boys and 10 girls is lined up in random order — that is, each of the 
15! permutations is assumed to be equally likely. 

(a) What is the probability that the person in the 4th position is a boy? 

(b) What about the person in the 12th position? 

(c) What is the probability that a particular boy is in the 3rd position? 

19. Consider a set of 23 unrelated people. Because each pair of people shares the same 
birthday with probability 1/365, and there are ( 2 ) = 253 pairs, why isn't the 
probability that at least two people have the same birthday equal to 253/365? 
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20. A town contains 4 television repairmen. If 4 sets break down, what is the probabil- 
ity that exactly 2 of the repairmen are called? What assumptions are you making? 

21. A woman has n keys, of which one will open her door. If she tries the keys at 
random, discarding those that do not work, what is the probability that she will 
open the door on her £th try? What if she does not discard previously tried keys? 

22. A closet contains 8 pairs of shoes. If 4 shoes are randomly selected, what is the 
probability that there will be (a) no complete pair and (b) exactly 1 complete pair? 

23. Of three cards, one is painted red on both sides; one is painted black on both sides; 
and one is painted red on one side and black on the other. A card is randomly 
chosen and placed on a table. If the side facing up is red, what is the probability 
that the other side is also red? 

24. A couple has 2 children. What is the probability that both are girls if the eldest is 
a girl? 

25. Fifty-two percent of the students at a certain college are females. Five percent of 
the students in this college are majoring in computer science. Two percent of 
the students are women majoring in computer science. If a student is selected at 
random, find the conditional probability that 

(a) this student is female, given that the student is majoring in computer science; 

(b) this student is majoring in computer science, given that the student is female. 

26. A total of 500 married working couples were polled about their annual salaries, 
with the following information resulting. 



Husband 



Wife Less than $25,000 More than $25,000 

Less than $25,000 212 198 

More than $25,000 36 54 



Thus, for instance, in 36 of the couples the wife earned more and the husband 
earned less than $25,000. If one of the couples is randomly chosen, what is 

(a) the probability that the husband earns less than $25,000; 

(b) the conditional probability that the wife earns more than $25,000 given that 
the husband earns more than this amount; 

(c) the conditional probability that the wife earns more than $25,000 given that 
the husband earns less than this amount? 

27. There are two local factories that produce radios. Each radio produced at factory^ 
is defective with probability .05, whereas each one produced at factory B is defective 
with probability .01. Suppose you purchase two radios that were produced at the 
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same factory, which is equally likely to have been either factory A or factory B. If 
the first radio that you check is defective, what is the conditional probability that 
the other one is also defective? 

28. A red die, a blue die, and a yellow die (all six-sided) are rolled. We are interested 
in the probability that the number appearing on the blue die is less than that 
appearing on the yellow die which is less than that appearing on the red die. (That 
is, if B (R) [Y] is the number appearing on the blue (red) [yellow] die, then we are 
interested in P(B < Y < R).) 

(a) What is the probability that no two of the dice land on the same number? 

(b) Given that no two of the dice land on the same number, what is the conditional 
probability that B < Y < R? 

(c) WhatisP(5 < Y < /?)? 

(d) If we regard the outcome of the experiment as the vector B, R, Y, how many 
outcomes are there in the sample space? 

(e) Without using the answer to (c), determine the number of outcomes that 
result in B < Y < R. 

(f) Use the results of parts (d) and (e) to verify your answer to part (c). 

29. You ask your neighbor to water a sickly plant while you are on vacation. Without 
water it will die with probability .8; with water it will die with probability .15. You 
are 90 percent certain that your neighbor will remember to water the plant. 

(a) What is the probability that the plant will be alive when you return? 

(b) If it is dead, what is the probability your neighbor forgot to water it? 

30. Two balls, each equally likely to be colored either red or blue, are put in an urn. 
At each stage one of the balls is randomly chosen, its color is noted, and it is 
then returned to the urn. If the first two balls chosen are colored red, what is the 
probability that 

(a) both balls in the urn are colored red; 

(b) the next ball chosen will be red? 

31. A total of 600 of the 1,000 people in a retirement community classify themselves as 
Republicans, while the others classify themselves as Democrats. In a local election 
in which everyone voted, 60 Republicans voted for the Democratic candidate, 
and 50 Democrats voted for the Republican candidate. If a randomly chosen 
community member voted for the Republican, what is the probability that she or 
he is a Democrat? 

32. Each of 2 balls is painted black or gold and then placed in an urn. Suppose that each 
ball is colored black with probability j> and that these events are independent. 

(a) Suppose that you obtain information that the gold paint has been used (and 
thus at least one of the balls is painted gold). Compute the conditional 
probability that both balls are painted gold. 
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(b) Suppose, now, that the urn tips over and 1 ball falls out. It is painted gold. 
What is the probability that both balls are gold in this case? Explain. 

33. Each of 2 cabinets identical in appearance has 2 drawers. Cabinet A contains a 
silver coin in each drawer, and cabinet B contains a silver coin in one of its drawers 
and a gold coin in the other. A cabinet is randomly selected, one of its drawers 
is opened, and a silver coin is found. What is the probability that there is a silver 
coin in the other drawer? 

34. Prostate cancer is the most common type of cancer found in males. As an indicator 
of whether a male has prostate cancer, doctors often perform a test that measures 
the level of the PSA protein (prostate specific antigen) that is produced only by 
the prostate gland. Although higher PSA levels are indicative of cancer, the test 
is notoriously unreliable. Indeed, the probability that a noncancerous man will 
have an elevated PSA level is approximately .135, with this probability increasing 
to approximately .268 if the man does have cancer. If, based on other factors, 
a physician is 70 percent certain that a male has prostate cancer, what is the 
conditional probability that he has the cancer given that 

(a) the test indicates an elevated PSA level; 

(b) the test does not indicate an elevated PSA level? 

Repeat the preceding, this time assuming that the physician initially believes there 
is a 30 percent chance the man has prostate cancer. 

35. Suppose that an insurance company classifies people into one of three classes — 
good risks, average risks, and bad risks. Their records indicate that the probabilities 
that good, average, and bad risk persons will be involved in an accident over a 
1-year span are, respectively, .05, .15, and .30. If 20 percent of the population 
are "good risks," 50 percent are "average risks," and 30 percent are "bad risks," 
what proportion of people have accidents in a fixed year? If policy holder A had 
no accidents in 1987, what is the probability that he or she is a good (average) 
risk? 

36. A pair of fair dice is rolled. Let E denote the event that the sum of the dice is equal 
to 7. 

(a) Show that E is independent of the event that the first die lands on 4. 

(b) Show that Eis independent of the event that the second die lands on 3. 

37. The probability of the closing of the z'th relay in the circuits shown is given by 
pi, i = 1, 2, 3, 4, 5. If all relays function independently, what is the probability 
that a current flows between A and B for the respective circuits? 

38. An engineering system consisting of n components is said to be a £-out-of- 
n system (k < n) if the system functions if and only if at least k of the n 
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components function. Suppose that all components function independently of 
each other. 

(a) If the z'th component functions with probability Pi, i = 1, 2, 3, 4, compute 
the probability that a 2-out-of-4 system functions. 

(b) Repeat (a) for a 3-out-of-5 system. 

39. Five independent flips of a fair coin are made. Find the probability that 

(a) the first three flips are the same; 

(b) either the first three flips are the same, or the last three flips are the same; 

(c) there are at least two heads among the first three flips, and at least two tails 
among the last three flips. 

40. Suppose that n independent trials, each of which results in any of the outcomes 
0, 1, or 2, with respective probabilities .3, .5, and .2, are performed. Find the 
probability that both outcome 1 and outcome 2 occur at least once. {Hint: Consider 
the complementary probability.) 

41 . A parallel system functions whenever at least one of its components works. Consider 
a parallel system of n components, and suppose that each component indepen- 
dently works with probability i. Find the conditional probability that component 
1 works, given that the system is functioning. 

42. A certain organism possesses a pair of each of 5 different genes (which we will 
designate by the first 5 letters of the English alphabet). Each gene appears in 2 
forms (which we designate by lowercase and capital letters) . The capital letter will 
be assumed to be the dominant gene in the sense that if an organism possesses 
the gene pair xX, then it will outwardly have the appearance of the X gene. For 
instance, if X stands for brown eyes and x for blue eyes, then an individual having 
either gene pair XX or xX will have brown eyes, whereas one having gene pair 
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xx will be blue-eyed. The characteristic appearance of an organism is called its 
phenotype, whereas its genetic constitution is called its genotype. (Thus 2 organ- 
isms with respective genotypes aA, bB, cc, dD, ee and AA, BB, cc, DD, ee would 
have different genotypes but the same phenotype.) In a mating between 2 organ- 
isms each one contributes, at random, one of its gene pairs of each type. The 5 
contributions of an organism (one of each of the 5 types) are assumed to be inde- 
pendent and are also independent of the contributions of its mate. In a mating 
between organisms having genotypes aA, bB, cC, dD, eE, and aa, bB, cc, Dd, ee, 
what is the probability that the progeny will (1) phenotypically, (2) genotypically 
resemble 

(a) the first parent; 

(b) the second parent; 

(c) either parent; 

(d) neither parent? 

43. Three prisoners are informed by their jailer that one of them has been chosen at 
random to be executed, and the other two are to be freed. Prisoner^ asks the jailer 
to tell him privately which of his fellow prisoners will be set free, claiming that 
there would be no harm in divulging this information because he already knows 
that at least one of the two will go free. The jailer refuses to answer this question, 
pointing out that if A knew which of his fellow prisoners were to be set free, then 
his own probability of being executed would rise from 3 to ^ because he would 
then be one of two prisoners. What do you think of the jailer's reasoning? 

44. Although both my parents have brown eyes, I have blue eyes. What is the 
probability that my sister has blue eyes? 

45. A set of k coupons, each of which is independently a typej coupon with probability 
pj, 5Z/L1 Pj = 1> ls collected. Find the probability that the set contains either 
a type i or a typej coupon. 
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RANDOM VARIABLES AND 
EXPECTATION 



4.1 RANDOM VARIABLES 

When a random experiment is performed, we are often not interested in all of the details 
of the experimental result but only in the value of some numerical quantity determined 
by the result. For instance, in tossing dice we are often interested in the sum of the two 
dice and are not really concerned about the values of the individual dice. That is, we may 
be interested in knowing that the sum is 7 and not be concerned over whether the actual 
outcome was (1, 6) or (2, 5) or (3, 4) or (4, 3) or (5, 2) or (6, 1). Also, a civil engineer 
may not be directly concerned with the daily risings and declines of the water level of 
a reservoir (which we can take as the experimental result) but may only care about the 
level at the end of a rainy season. These quantities of interest that are determined by the 
result of the experiment are known as random variables. 

Since the value of a random variable is determined by the outcome of the experiment, 
we may assign probabilities of its possible values. 

EXAMPLE 4. la LettingX denote the random variable that is defined as the sum of two fair 
dice, then 



P {X = 2}=P{(1,1)} = 1 (4.1.1) 



P{ X = 3}=P{(l,2),(2,l)} = I 
/>{X = 4} =/>{(!, 3), (2, 2), (3,1)}- ; 



36 

= 36 



P{X = 5} = />{(1, 4), (2, 3), (3, 2), (4, 1)} = | 
P{X = 6} = P{{1, 5), (2, 4), (3, 3), (4, 2), (5, 1)} = ^ 
P{X = 7} = P{{\, 6), (2, 5), (3, 4), (4, 3), (5, 2), (6, 1)} = | 
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P{X = 8} = P{{2, 6), (3, 5), (4, 4), (5, 3), (6, 2)} - - 



P{X = 9} = P{(3, 6), (4, 5), (5, 4), (6, 3)} = £ 

36 



36 
4 



P{X = 10} = P{(4, 6), (5, 5), (6, 4)} - ^ 



/>{X=11} = P{(5,6) ) (6,5)} = | 
P{X=12}=P{(6,6)} = ± 

In other words, the random variable X can take on any integral value between 2 and 12 
and the probability that it takes on each value is given by Equation 4.1.1. Since X must 
take on some value, we must have 

(12 \ 12 

\J{X = i}\ = Y,P{X = i) 
i=2 I i=2 

which is easily verified from Equation 4.1.1. 

Another random variable of possible interest in this experiment is the value of the first 
die. Letting Y denote this random variable, then Y is equally likely to take on any of the 
values 1 through 6. That is, 

P{Y = i} = \IG, i= 1,2,3,4,5,6 ■ 

EXAMPLE 4.1b Suppose that an individual purchases two electronic components each of 
which may be either defective or acceptable. In addition, suppose that the four possible 
results — (d, d), (d, a), (a, d), {a, a) — have respective probabilities .09, .21, .21, .49 
[where (d, d) means that both components are defective, (d, a) that the first component 
is defective and the second acceptable, and so on]. If we let X denote the number of 
acceptable components obtained in the purchase, then X is a random variable taking on 
one of the values 0, 1, 2 with respective probabilities 

P{X = 0} = .09 
P{X= 1} = .42 
P{X = 2} = .49 

If we were mainly concerned with whether there was at least one acceptable component, 
we could define the random variable / by 

_ f 1 if X = 1 or 2 
1 ~ jo ifX = 

If A denotes the event that at least one acceptable component is obtained, then the random 
variable / is called the indicator random variable for the event A, since / will equal 1 
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or depending upon whether A occurs. The probabilities attached to the possible values 
of/ are 

P{I=l} = .91 

P{/ = 0} = .09 ■ 

In the two foregoing examples, the random variables of interest took on a finite num- 
ber of possible values. Random variables whose set of possible values can be written either 
as a finite sequence x\ , . . . , x„, or as an infinite sequence x\ , . . . are said to be discrete. For 
instance, a random variable whose set of possible values is the set of nonnegative integers 
is a discrete random variable. However, there also exist random variables that take on 
a continuum of possible values. These are known as continuous random variables. One 
example is the random variable denoting the lifetime of a car, when the car's lifetime is 
assumed to take on any value in some interval {a, b). 

The cumulative distribution function, or more simply the distribution function, F of the 
random variable X is defined for any real number x by 

F{x) = P{X < x} 

That is, F(x) is the probability that the random variable X takes on a value that is less than 
or equal to x. 

Notation: We will use the notation X ~ F to signify that F is the distribution function 
ofX. 

All probability questions about X can be answered in terms of its distribution function 
F . For example, suppose we wanted to compute P{a < X < b}. This can be accomplished 
by first noting that the event {X < b} can be expressed as the union of the two mutually 
exclusive events [X < a] and [a < X < b}. Therefore, applying Axiom 3, we obtain that 

P{X <b}= P{X <a}+P{a<X < b] 



P{a <X <b}= F{b) - F{a) 
EXAMPLE 4.1c Suppose the random variable X has distribution function 



F{x) = 



x < 

2\ 



1 — exp{— x } x > 
What is the probability that X exceeds 1 ? 
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SOLUTION The desired probability is computed as follows: 

P{X > 1} = 1 -P{X < 1} 
= 1-F(l) 
= e~ l 
= .368 ■ 

4.2 TYPES OF RANDOM VARIABLES 

As was previously mentioned, a random variable whose set of possible values is a sequence 
is said to be discrete. For a discrete random variable X, we define the probability mass 
function p{a) o£X by 

p{a) = P{X = a} 

The probability mass function p(a) is positive for at most a countable number of values 
of a. That is, i£X must assume one of the values x\,x%,..., then 

p(xj) > 0, i — 1,2, . .. 

p{x) = 0, all other values of x 

Since X must take on one of the values x,-, we have 

^2,p(xi) = i 

EXAMPLE 4.2a Consider a random variable X that is equal to 1, 2, or 3. If we know that 

p{\) = \ and p(2) = \ 

then it follows (since p{\) + p(2) + p(3) = 1) that 

P0) = I 

A graph of/>(x) is presented in Figure 4.1. ■ 

The cumulative distribution function F can be expressed in terms o£p{x) by 

F{a) = J2 PM 

all* <a 

If X is a discrete random variable whose set of possible values are x\, xj_, x$, . . ., where 
x\ < X2 < X3 < • • • , then its distribution function F is a step function. That is, the value 
of F is constant in the intervals [xy_i, xy) and then takes a step (or jump) of s\ztp{xi) at X{. 
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FIGURE 4.1 Graph of(p)x, Example 4.2a. 
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FIGURE 4.2 Graph ofF(x). 



For instance, suppose X has a probability mass function given (as in Example 4.2a) by 
p{\) = \, p{2) = I, p{3) = I 
then the cumulative distribution function F of X is given by 



F(a) = 



a < 1 

\ 1 <a <2 

\ 2<a<3 

1 3 <a 



This is graphically presented in Figure 4.2. 

Whereas the set of possible values of a discrete random variable is a sequence, we 
often must consider random variables whose set of possible values is an interval. Let X 
be such a random variable. We say that X is a continuous random variable if there exists 
a nonnegative function f(x), defined for all real x € (— oo, oo), having the property that 
for any set B of real numbers 



P{X 6 B] 



Jb 



i.2.1) 



The function /"(x) is called the probability density function of the random variable X. 
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In words, Equation 4.2. 1 states that the probability that X will be in B may be obtained 
by integrating the probability density function over the set B. Since X must assume some 
value, fix) must satisfy 



1 =P{X a (-00,00)} 



/oo 
fix 
-00 



1 dx 



All probability statements about X can be answered in terms of f(x). For instance, letting 
B = [a, b], we obtain from Equation 4.2.1 that 



1.2.2) 



P{a < X < b] = J f{x) dx 

J a 

If we let a. = b in the above, then 

P[X = a}= J f{x) dx = 

J a 



In words, this equation states that the probability that a continuous random variable will 
assume any particular value is zero. (See Figure 4.3.) 

The relationship between the cumulative distribution F(-) and the probability density 
/"(•) is expressed by 



F(a) = P{X 6 (-00, a]} = J fix 

J —00 

Differentiating both sides yields 



da 



Fid) =fia) 



f(x) = e- x 




Area of shaded region = P { a < X < b } 



FIGURE 4.3 The probability density function f(x) 



x > 
x < 0' 
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That is, the density is the derivative of the cumulative distribution function. A somewhat 
more intuitive interpretation of the density function may be obtained from Equation 4.2.2 
as follows: 



P \a-- <X <a+ - 

2 ~ ~ 2 



\-L 



a+s/2 



a— ell 



f(x)dx «s sf{a) 



when s is small. In other words, the probability that X will be contained in an interval 
of length e around the point a is approximately ef{a). From this, we see that f{a) is 
a measure of how likely it is that the random variable will be near a. 

EXAMPLE 4.2b Suppose thatX is a continuous random variable whose probability density 
function is given by 



/C* 



C{Ax - 2x 2 ) < x < 2 
otherwise 



(a) What is the value of C? 

(b) FmdP{X > 1}. 

SOLUTION (a) Since f is a probability density function, we must have that 
j_ o0 f(x) dx = 1, implying that 



C / {Ax - 2x 2 ) dx = 1 
Jo 



or 



C 



2x z 



2x 3 



x=2 



= i 



c = 



(b) Hence 



f{x) dx=\\ {Ax- 2x 2 



4.3 JOINTLY DISTRIBUTED RANDOM VARIABLES 

For a given experiment, we are often interested not only in probability distribution 
functions of individual random variables but also in the relationships between two or 
more random variables. For instance, in an experiment into the possible causes of cancer, 
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we might be interested in the relationship between the average number of cigarettes 
smoked daily and the age at which an individual contracts cancer. Similarly, an engi- 
neer might be interested in the relationship between the shear strength and the diameter 
of a spot weld in a fabricated sheet steel specimen. 

To specify the relationship between two random variables, we define the joint 
cumulative probability distribution function of X and Y by 

F{x,y) = P{X < x, Y < y} 

A knowledge of the joint probability distribution function enables one, at least in theory, to 
compute the probability of any statement concerning the values of X and Y. For instance, 
the distribution function of X — call it Fx — can be obtained from the joint distribution 
function F of X and Y as follows: 

F x (x) = P{X < x] 

= P{X < x, Y < 00} 

= F(x, 00) 

Similarly, the cumulative distribution function of Y is given by 

F Y (y) = F(oo,y) 

In the case where X and Y are both discrete random variables whose possible values 
are, respectively, x\, x%, . . ., and_yi,j/2, . . ., we define the joint probability mass function of 
Xand Y,p(xi,yj), by 

p(xi,yj) = P{X = Xi , Y =yj} 

The individual probability mass functions of X and Y are easily obtained from the joint 
probability mass function by the following reasoning. Since Y must take on some value yj, 
it follows that the event [X = xf\ can be written as the union, over ally, of the mutually 
exclusive events {X = X{, Y = yj}. That is, 



{X = x i } = \J{X = x 1 ,Y=y J } 



J 
and so, using Axiom 3 of the probability function, we see that 



P{X = x t ] = P [\J{X = Xi , Y=yj}\ (4.3.1) 

= Y,P{X = Xl ,Y= yj } 
J 

= y^j{xj,yj) 
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Similarly, we can obtain P{Y — jj\ by summing p{xi,yj) over all possible values of x,, 
that is, 

p i Y = yi\ = J2 P[X = x >' Y = y$ (4 - 3 - 2) 

i 

Hence, specifying the joint probability mass function always determines the individual mass 
functions. However, it should be noted that the reverse is not true. Namely, knowledge of 
P{X = Xi) and P{Y = yj\ does not determine the value of P{X = x,, Y = jj\. 

EXAMPLE 4.3a Suppose that 3 batteries are randomly chosen from a group of 3 new, 4 
used but still working, and 5 defective batteries. If we let X and Y denote, respectively, 
the number of new and used but still working batteries that are chosen, then the joint 
probability mass function of X and Y,p{i,j) = P{X = i, Y = j}, is given by 



p(0,0)=( 5 j I \l\ ~ 10/220 



3 



p(0,3) = ( 4 ^j/( l ^= 4/220 

^»-G)(t)(i)/(?)— «- 

^ = (')(0/(3) = 1M » 

^©0/(3)— 

?a ° ) = (3)/(3 2 ) = " 220 

These probabilities can most easily be expressed in tabular form as shown in Table 4. 1 . 
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TABLE 4.1 P{X = i,Y=j] 



\ 


i 










Row Sum 


i \ 


\ 





1 


2 


3 


= P{X= ,'} 







10 


40 


30 


4 


84 




220 


:> 


220 


221) 


220 


1 




30 


60 


18 





108 




:_" 


::>< 


220 


220 


2 




15 
220 


12 

220 








27 
220 


3 




1 

220 











1 

220 


Column 












Sums 


= 












P{Y = 


j) 


56 

220 


112 
220 


48 
220 


4 
220 





The reader should note that the probability mass function of X is obtained by computing 
the row sums, in accordance with the Equation 4.3.1, whereas the probability mass function 
of Y is obtained by computing the column sums, in accordance with Equation 4.3.2. 
Because the individual probability mass functions of X and Y thus appear in the margin of 
such a table, they are often referred to as being the marginal probability mass functions of 
X and Y, respectively. It should be noted that to check the correctness of such a table we 
could sum the marginal row (or the marginal column) and verify that its sum is 1 . (Why 
must the sum of the entries in the marginal row (or column) equal 1 ?) I 

EXAMPLE 4.3b Suppose that 15 percent of the families in a certain community have no 
children, 20 percent have 1, 35 percent have 2, and 30 percent have 3 children; suppose 
further that each child is equally likely (and independently) to be a boy or a girl. If a 
family is chosen at random from this community, then B, the number of boys, and G, 
the number of girls, in this family will have the joint probability mass function shown 
in Table 4.2. 



TABLE 4.2 P{B = i,G=j] 

j Row Sum 

12 3 =P{B=i} 






.15 


.10 


.0875 


.0375 


.3750 


1 


.10 


.175 


.1125 





.3875 


2 


.0875 


.1125 








.2000 


3 


.0375 











.0375 


Column 












Sum = 












P[G=j) 


.3750 


.3875 


.2000 


.0375 
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These probabilities are obtained as follows: 

P{B = 0, G = 0} = 7>{no children} 

= .15 
P[B = 0, G = 1} = P{\ girl and total of 1 child} 

= 7 5 {lchild}P{lgirl|l child} 

= (.20)(±) = .l 
P[B = 0, G = 2} = P{2 girls and total of 2 children} 

= P{2 children}7 5 {2 girls 1 2 children} 

= (.35) {\f = .0875 
P{B = 0, G = 3} = P{3 girls and total of 3 children} 
= P{3 children}P{3 girls 1 3 children} 

= (.30) (\f = .0375 

We leave it to the reader to verify the remainder of Table 4.2, which tells us, among other 
things, that the family chosen will have at least 1 girl with probability .625. I 

We say thatX and Y are jointly continuous if there exists a function f(x,y) defined for 
all real x andjy, having the property that for every set C of pairs of real numbers (that is, 
C is a set in the two-dimensional plane) 

P{(X, Y) e C} = ff f(x,y) dx dy (4.3.3) 

(x,y)eC 

The function f(x,y) is called the joint probability density function of X and Y . \{~A and B 
are any sets of real numbers, then by defining C = {(x,y) : x € A,y £ B), we see from 
Equation 4.3.3 that 

P{X £A,Y £B}= j I f(x,y) dx dy (4.3.4) 

Jb J a 



Because 



F(a, b) = P[X e (-oo, a], Y e (-oo, b]} 

-b 



/ f{x,y) dx 
-oo J —oo 
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it follows, upon differentiation, that 

3 2 
f(a, b) = ——F(a, b) 
da dp 

wherever the partial derivatives are defined. Another interpretation of the joint density 
function is obtained from Equation 4.3.4 as follows: 

rd+db na+da 

P{a < X < a + da, b < Y < b + db) = / f(x,y) dx dy 

Jb Ja 

£» f(a, b)dadb 

when da and db are small zndf(x,y) is continuous at a, b. Hence /X<2, b) is a measure of 
how likely it is that the random vector (X, Y) will be near {a, b). 

If X and Y are jointly continuous, they are individually continuous, and their 
probability density functions can be obtained as follows: 

P{X &A}=P{X &A,Y &{-oo,oo)} (4.3.5) 

> 
f(x,y)dydx 



-11 



A J-oc 



vfhere 



= / fx(x)dx 
Ja 

/oc 
f(x,y) dy 
-oo 

is thus the probability density function of X. Similarly, the probability density function 
of Y is given by 

/OC' 
f(x,y)dx (4.3.6) 

-oo 

EXAMPLE 4.3c The joint density function of X and Y is given by 

I 2e~ x e~ 2y < x < oo,0 < y < oo 
1 otherwise 

Compute (a) P{X > 1, Y < 1}; (b) P{X < Y}; and (c) P{X < a}. 
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SOLUTION 

(a) 



(b) 



(c) 



p 1 pOO 

P{X > 1, Y < 1} = / / 2e- x e- 2 y dxdy 

= [ 2e- 2 y{-e-*\f)dy 

Jo 

= e~ l f 2e~ 2 y dy 
Jo 



= e- 1 n-e- 2 ) 



(x,y):x<y 

= [ f 2e~ x e- 2 y dxdy 
Jo Jo 

= / 2e- 2 y{\-e-y)dy 
Jo 

poo poo 

= / 2e~ 2 ydy- / 2e~ 3 y, 
Jo Jo 



1 3 

\ 

3 



pa poo 

P{X<a}= / 2e- 2 y e - x dydx 

Jo Jo 

-[■ 

Jo 



e x dx 
io 



= l-e~ a M 

4.3. 1 Independent Random Variables 

The random variables X and Y are said to be independent if for any two sets of real 
numbers A and B 

P{X €A,Y eB}=P{X €A}P{Y e B) {4.3.7) 

In other words, X and Y are independent if, for all A and B, the events Ea = {X £ A} 
and Eg = [Y E B} are independent. 

It can be shown by using the three axioms of probability that Equation 4.3.7 will follow 
if and only if for all a, b 

P{X <a,Y <b}= P{X < a}P{Y < I?} 
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Hence, in terms of the joint distribution function F of X and Y, we have that X and Y 
are independent if 

F(a, b) = Fx(o)Fy(1>) for all a, b 

When X and Y are discrete random variables, the condition of independence 
Equation 4.3.7 is equivalent to 

p(x,y) = px(x)pr(y) for all x, y (4.3.8) 

where px 3-ndpy are the probability mass functions o£X and Y. The equivalence follows 
because, if Equation 4.3.7 is satisfied, then we obtain Equation 4.3.8 by letting A and B 
be, respectively, the one-point sets A = {x},B = {y}. Furthermore, if Equation 4.3.8 is 
valid, then for any sets A, B 

P{X €A,Y €B} = J2 $^(*.j0 

yeB xeA 

= ^2 y^jx(x)p Y (y) 

yeB xeA 

yeB xeA 

= P{Y €B}P{X €A} 

and thus Equation 4.3.7 is established. 

In the jointly continuous case, the condition of independence is equivalent to 

f(x,y) =fx{x)f Y {y) for all x,y 

Loosely speaking, X and Y are independent if knowing the value of one does not change 
the distribution of the other. Random variables that are not independent are said to be 
dependent. 

EXAMPLE 4.3d Suppose that X and Y are independent random variables having the 
common density function 

f{x)=\ e ~ X X> ° 

1 otherwise 

Find the density function of the random variable XIY. 
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SOLUTION We start by determining the distribution function of XIY . For a > 
Fx/ria) = P{XIY < a] 

f(x,y) dx dy 



e x e y dx dy 



-If- 

xly<a 

-if 

xly<a 

/•CO ray 

= I I e~ x e~ y dxdy 
Jo Jo 

=f 

Jo 



{\ - e-"y)e-y dy 



~ y + 



,-{a+\)y' 

a+ 1 



= 1 



a+ 1 



Differentiation yields that the density function of XIY is given by 
fxiYia) = 1/0* + I) 2 , < a < oo ■ 

We can also define joint probability distributions for n random variables in exactly 
the same manner as we did for n = 2. For instance, the joint cumulative probability 
distribution function F{a\,ai, ■ ■ ■ , a„) of the n random variables X\,Xi, . . . ,X n is defined 
by 

F(ai,a2,. . . ,a n ) = P{X\ < a\,Xi < a.2,- ■ ■ ,X„ < a n } 

If these random variables are discrete, we define their joint probability mass function 
p{x\,X2,. ■ ■ ,x„) by 

p{x\,x%,. . . ,x„) = P{X\ = x\,X 2 = x 2 , ■ ■ ■ ,X„ = x n \ 

Further, the n random variables are said to be jointly continuous if there exists a function 
f{x\,X2, ■ ■ ■ ,x„), called the joint probability density function, such that for any set C in 
«-space 



P{(X u X 2 ,...,X n )eC} 



J J(xi,...,x,,)eC J 



S*yi) CvJ^Y CvJC^ ' ' ' CivCf2 
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In particular, for any n sets of real numbers A\,A%, . . . ,A„ 
P{X X eAi,X 2 eA2,...,X„eA n } 

= I I . . . I f{x\ ,... , x„) dx\ dx2 ■ ■ . dx n 

JA„ Ja„^i J A\ 

The concept of independence may, of course, also be defined for more than two random 
variables. In general, the n random variables X\, X%, . . . , X n are said to be independent if, 
for all sets of real numbers A \, A2 , . . . ,A„, 

n 

P{X l eAi,X 2 eA2,...,X n eA„} = Yl p & G A A 

As before, it can be shown that this condition is equivalent to 
P{X\ < a\,X2 < a2, . . . ,X„ < a„} 

n 

= [[ P{Xl < di] for all a\, a2, . . . , a„ 



Finally, we say that an infinite collection of random variables is independent if every finite 
subcollection of them is independent. 

EXAMPLE 4.3e Suppose that the successive daily changes of the price of a given stock are 
assumed to be independent and identically distributed random variables with probability 
mass function given by 



-3 with probability .05 

-2 with probability .10 

- 1 with probability .20 

with probability .30 

1 with probability .20 

2 with probability .10 

3 with probability .05 



P{ daily change is i} = 



Then the probability that the stock's price will increase successively by 1, 2, and points 
in the next three days is 

P{X l =l,X 2 =2,X 3 =0} = (.20) (.10) (.30) = .006 

where we have let X; denote the change on the z'th day. ■ 
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*4.3.2 Conditional Distributions 

The relationship between two random variables can often be clarified by consideration of 
the conditional distribution of one given the value of the other. 

Recall that for any two events E and F, the conditional probability of E given F is 
defined, provided that P{F) > 0, by 

, P(EF) 

P(E\F) = — 

Hence, if X and Y are discrete random variables, it is natural to define the conditional 
probability mass function of X given that Y = y, by 

px\r(x\y) = P{X = x\Y = y] 
_ P{X = x,Y =y} 

P{Y=y] 
_ p(x,y) 

py{y) 

for all values of y such that py(y) > 0. 

EXAMPLE 4.3f If we know, in Example 4.3b, that the family chosen has one girl, compute 
the conditional probability mass function of the number of boys in the family. 

SOLUTION We first note from Table 4.2 that 

P{G = 1} = .3875 



Hence, 



P{B = 0,G =1} .10 

P{B = 0\G=l} = 



P{B= \\G= lj 
P[B = 2\G= lj 
P{B = 3\G= lj 



P{G=l] 
P{B= \,G= lj 

^{G=lj 
P{B = 2,G= lj 

^{G=lj 
P{B = 3,G= lj 

^{G=lj 



.3875 


= 8/31 


.175 
.3875 


= 14/31 


.1125 

.3875 


= 9/31 








Thus, for instance, given 1 girl, there are 23 chances out of 3 1 that there will also be 
at least 1 boy. ■ 

* Optional section. 
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EXAMPLE 4.3g Suppose that p(x,y), the joint probability mass function of X and Y, is 
given by 

p(0,0) = A, p(0,l) = .2, p(l,0) = .l, p(l,l) = 3. 

Calculate the conditional probability mass function of X given that Y = 1 . 
SOLUTION We first note that 

P{Y = 1} = J2?(*> 1) =/ , (°' 1) +PU> 1) = -5 

Hence, 

P{X = OIF = 1} = ^ = 2/5 

1 ' ' -P{F=1} 

P{ X = l\Y = 1} = -^ — = 3/5 ■ 

1 ' * P{Y=1] 

If X and Y have a joint probability density function f(x,y), then the conditional 
probability density function of X, given that Y = y, is defined for all values of y such 
that/Hj) > Oj by 

fx\Y(x\y) = -7T-— 
/rOO 

To motivate this definition, multiply the left-hand side by dx and the right-hand side by 
{dx dy)ldy to obtain 

f{x,y)dxdy 
fx\Y\x\y)dx = 

JY\y)dy 

^ P{x < X < x + dx,y < Y <y + dy] 
P{y <Y<y + dy} 

= P{x < X < x + dy\y < Y < y + dy] 

In other words, for small values of dx and dy, Jx\y(x\y) dx represents the conditional 
probability thatX is between x and x + dx, given that Y is between y and y + dy. 

The use of conditional densities allows us to define conditional probabilities of events 
associated with one random variable when we are given the value of a second random 
variable. That is, ifX and Y are jointly continuous, then, for any set A, 



Ja 



P{XeA\Y=y}= fx\y(x\y)dx 
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EXAMPLE 4.3h The joint density of X and Y is given by 

\^x(2-x-y) < x < 1,0 <y < 1 
1 otherwise 

Compute the conditional density of X, given that Y = y, where < y < 1. 
SOLUTION For < x < 1, < y < 1, we have 



_ f(x,y) 

x(2 — x — y) 
L x(2 — x — y) dx 
x(2 — x — y) 

I -J/2 

6x(2 — x — j) 

4 - 3_y 

4.4 EXPECTATION 

One of the most important concepts in probability theory is that of the expectation 
of a random variable. If X is a discrete random variable taking on the possible values 
x\,X2,---, then the expectation or expected value of X , denoted by E[X], is defined by 

E\X]=^xiP{X = Xi} 

i 

In words, the expected value of X is a weighted average of the possible values that X can 
take on, each value being weighted by the probability that X assumes it. For instance, if 
the probability mass function of X is given by 

P(0)= 1 2=PW 

then 

2?[jn = o(i) + i(i) = i 

is just the ordinary average of the two possible values and 1 thatX can assume. On the 
other hand, if 

/>(0) = i, />U) = § 
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then 

£[X] = 0(§) + 1(§) = § 

is a weighted average of the two possible values and 1 where the value 1 is given twice as 
much weight as the value since p{\) = 2p{0). 

Another motivation of the definition of expectation is provided by the frequency 
interpretation of probabilities. This interpretation assumes that if an infinite sequence 
of independent replications of an experiment is performed, then for any event E, the pro- 
portion of time that E occurs will be P(E). Now, consider a random variable X that must 
take on one of the values x\, x%, . . . , x„ with respective probabilities p{x\),p{x2), . . . ,p{x n ); 
and think of X as representing our winnings in a single game of chance. That is, with 
probability p(xi) we shall win xi units i — l,2,...,n. Now by the frequency interpreta- 
tion, it follows that if we continually play this game, then the proportion of time that we 
win Xi will bep(xi). Since this is true for all i, i = 1, 2, ...,», it follows that our average 
winnings per game will be 

n 

i=\ 

To see this argument more clearly, suppose that we play TV games where TV is very large. 
Then in approximately Np{xi) of these games, we shall win xi, and thus our total winnings 
in the TV games will be 



T, x ' N t 



p\ A ii 



implying that our average winnings per game are 

E^ = !><*) = *[*] 

i-l i=\ 

EXAMPLE 4.4a Find E[X] where X is the outcome when we roll a fair die. 
SOLUTION Since p{\) = p{2) = p{3) = p(4) = p{5) = p{6) = \, we obtain that 

™=ia)+2a)+3a)+4(i) + 5(i)+6(i)=z 

The reader should note that, for this example, the expected value of X is not a value that X 
could possibly assume. (That is, rolling a die cannot possibly lead to an outcome of 7/2.) 
Thus, even though we call E[X] the expectation of X, it should not be interpreted as the 
value that we expect X to have but rather as the average value of X in a large number of 
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repetitions of the experiment. That is, if we continually roll a fair die, then after a large 
number of rolls the average of all the outcomes will be approximately 7/2. (The interested 
reader should try this as an experiment.) ■ 

EXAMPLE 4.4b If/ is an indicator random variable for the event A, that is, if 

II if A occurs 
if A does not occur 

then 

E[I] = \P{A) + 0P{A c ) = P(A) 

Hence, the expectation of the indicator random variable for the event A is just the 
probability that A occurs. ■ 

EXAMPLE 4.4c Entropy For a given random variable X, how much information is conveyed 
in the message that X = x? Let us begin our attempts at quantifying this statement by 
agreeing that the amount of information in the message that X = x should depend on 
how likely it was that X would equal x. In addition, it seems reasonable that the more 
unlikely it was that X would equal x, the more informative would be the message. For 
instance, i(X represents the sum of two fair dice, then there seems to be more information 
in the message thatX equals 12 than there would be in the message thatX equals 7, since 
the former event has probability -r? and the latter -? . 

Let us denote by I(p) the amount of information contained in the message that an event, 
whose probability is p, has occurred. Clearly I(p) should be a nonnegative, decreasing 
function of p. To determine its form, let X and Y be independent random variables, and 
suppose that P{X = x] = p and P{Y = y] = q. How much information is contained in 
the message that X equals x and Y equals yl To answer this, note first that the amount 
of information in the statement that X equals x is I(p). Also, since knowledge of the fact 
that X is equal to x does not affect the probability that Y will equal y (since X and Y are 
independent), it seems reasonable that the additional amount of information contained in 
the statement that Y = y should equal I{q). Thus, it seems that the amount of information 
in the message that X equals x and Y equals y is I(p) + I(q). On the other hand, however, 
we have that 

P{X = X ,Y =y}=P{X = x}P{Y =y}=pq 

which implies that the amount of information in the message that X equals x and Y equals 
y is I(pq). Therefore, it seems that the function I should satisfy the identity 

I(pq)=I(p)+I(q) 
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However, if we define the function G by 

G(p) = I{2-P) 
then we see from the above that 

G{p + q)=I{2- ( P + l ) ) 

= 1(2-?) + 1(2-4) 
= G(p) + G(q) 

However, it can be shown that the only (monotone) functions G that satisfy the foregoing 
functional relationship are those of the form 

G(p) = cp 

for some constant c. Therefore, we must have that 

I(2~P) = cp 

or, letting q = 2~P 

I(q) = - c \ g 2 (q) 

for some positive constant c. It is traditional to let c = 1 and to say that the information 
is measured in units of bits (short for binary digits). 

Consider now a random variable X, which must take on one of the values x\,...,x n 
with respective probabilities p\, . . . ,p n . As log 2 (/>j) represents the information conveyed 
by the message that X is equal to x;, it follows that the expected amount of information 
that will be conveyed when the value of X is transmitted is given by 

n 

H(X) = -J2pi lo &z(pi) 

i=\ 

The quantity H(X) is known in information theory as the entropy of the random 
variable X. ■ 

We can also define the expectation of a continuous random variable. Suppose that X is 
a continuous random variable with probability density function/ - . Since, for dx small 

f(x) dx Ri P{x < X < x + dx} 

it follows that a weighted average of all possible values ofX, with the weight given to x 
equal to the probability that X is near x, is just the integral over all x o(xf(x) dx. Hence, 
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-10 12 

p (-1) = .10, p(0) = .25, p(1) = .30, p(2) = .35 

center of gravity = .9 



FIGURE 4.4 

it is natural to define the expected value of X by 

/rxs 
*/(*) dx 
-co 

EXAMPLE 4.4d Suppose that you are expecting a message at some time past 5 p.m. From 
experience you know that X, the number of hours after 5 p.m. until the message arrives, 
is a random variable with the following probability density function: 



/(*) = 



— if < x < 1.5 
1.5 

otherwise 



r- 5 x 

E[X]= / —dx = J5 
Jo !•- 



The expected amount of time past 5 p.m. until the message arrives is given by 

•1.5 
~5 

Hence, on average, you would have to wait three-fourths of an hour. ■ 

REMARKS 

(a) The concept of expectation is analogous to the physical concept of the center of gravity 
of a distribution of mass. Consider a discrete random variable X having probability mass 
function P(xi), i > 1. If we now imagine a weightless rod in which weights with mass 
P{xi), i > 1 are located at the points Xi, i > 1 (see Figure 4A), then the point at which 
the rod would be in balance is known as the center of gravity. For those readers acquainted 
with elementary statics, it is now a simple matter to show that this point is at E[X\* 

(b) E[X] has the same units of measurement as does X. 

4.5 PROPERTIES OF THE EXPECTED VALUE 

Suppose now that we are given a random variable X and its probability distribution (that 
is, its probability mass function in the discrete case or its probability density function in 
the continuous case). Suppose also that we are interested in calculating, not the expected 



* To prove this, we must show that the sum of the torques tending to turn the point around E[X] is equal to 0. Tha 
is, we must show that = Y^.Axi — E[X])p(x;), which is immediate. 
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value of X, but the expected value of some function of X, say g(X). How do we go 
about doing this? One way is as follows. Since g(X) is itself a random variable, it must 
have a probability distribution, which should be computable from a knowledge of the 
distribution of X. Once we have obtained the distribution of g(X), we can then compute 
E[g(X)] by the definition of the expectation. 

EXAMPLE 4.5a Suppose X has the following probability mass function 

p(0) = .2, />(!) = -5, p(2) = 3 

Calculate E[X 2 ]. 

SOLUTION Letting Y = X , we have that Y is a random variable that can take on one of 
the values , 1 , 2 with respective probabilities 

p Y (0) = P{Y = 2 } = .2 
p Y {\) = P{Y = l 2 } = .5 
p Y (4) = P{Y = 2 2 } = .3 

Hence, 

E[X 2 ] = E[Y] = 0(.2) + 1(.5) + 4(3) = 1.7 ■ 

EXAMPLE 4.5b The time, in hours, it takes to locate and repair an electrical breakdown in 
a certain factory is a random variable — call it X — whose density function is given by 

jl ifO<*<l 
1 otherwise 

If the cost involved in a breakdown of duration x is x , what is the expected cost of such 
a breakdown? 

SOLUTION Letting Y = X 3 denote the cost, we first calculate its distribution function as 
follows. For < a < 1, 

F Y (a) = P{Y < a} 
= P{X 3 < a} 
= P{X < a 1 ' 3 } 

r m 

= I dx 

Jo 

— „l/3 
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By differentiating Fy{d), we obtain the density of Y, 

f Y (a) = -a' 213 , < a < 1 
3 

Hence, 

/oo 
af Y {a)da 
-oo 



f l l - 

= / a — a 
Jo 3 

3 Jo 



•2/3 



rf/7 



1/3 ^ 
I 

"34* '° 
1 



While the foregoing procedure will, in theory, always enable us to compute the expec- 
tation of any function ofX from a knowledge of the distribution of X, there is an easier 
way of doing this. Suppose, for instance, that we wanted to compute the expected value 
of g(X). Since g(X) takes on the value g(X) when X = x, it seems intuitive that E[g{X)] 
should be a weighted average of the possible values g{X) with, for a given x, the weight 
given to g{x) being equal to the probability (or probability density in the continuous case) 
that X will equal x. Indeed, the foregoing can be shown to be true and we thus have the 
following proposition. 

PROPOSITION 4.5.1 EXPECTATION OF A FUNCTION OF A RANDOM VARIABLE 

(a) If X is a discrete random variable with probability mass function p(x), then for any 
real-valued function g, 



E\giX)\ = £*(*)/>(*) 



(b) If X is a continuous random variable with probability density function /"(x), then 
for any real-valued function g, 



/oo 
g(x)f(x) dx 
-00 
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EXAMPLE 4.5c Applying Proposition 4.5.1 to Example 4.5a yields 

E[X 2 ] = 2 (0.2) + (1 2 )(0.5) + (2 2 )(0.3) = 1.7 
which, of course, checks with the result derived in Example 4.5a. ■ 
EXAMPLE 4.5d Applying the proposition to Example 4.5b yields 



-f 

Jo 



E[X>] = / x?dx (since/W = 1,0 <x < 1) 

1 
= 4 " 

An immediate corollary of Proposition 4.5.1 is the following. 

Corollary 4.5.2 

If a and b are constants, then 

E[aX + b] = aE[X~\ + b 

Proof 

In the discrete case, 

E[aX + b] = y\ax + b)p{x) 

= a y^xp(x) + b / t p(x) 

x x 

= aE[X] + b 
In the continuous case, 

/oo 
(ax + b)f{x) dx 
-oo 

/OO POO 

xf(x) dx + b I f{x) dx 
-oo J —oo 

= aE[X] + b □ 
If we take a = in Corollary 4.5.2, we see that 

E[b] = b 
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That is, the expected value of a constant is just its value. (Is this intuitive?) Also, if we take 
b = 0, then we obtain 

E[aX] = aE[X] 

or, in words, the expected value of a constant multiplied by a random variable is just the 
constant times the expected value of the random variable. The expected value of a random 
variable X, E[X], is also referred to as the mean or the first moment oiX. The quantity 
E[X n ], n > 1, is called the wth moment of X. By Proposition 4.5.1, we note that 



E[X n ] = 



2 x"p(x) i(X is discrete 

X 

/oo 
x n f(x) dx \iX is continuous 
-oo 



4.5. 1 Expected Value of Sums of Random Variables 

The two-dimensional version of Proposition 4.5.1 states that if X and Y are random 
variables andg is a function of two variables, then 

E[g(X, Y)] = y^ y^g(x,y)p(x,y) in the discrete case 

y x 

/oo poo 
/ g( x >y)f( x >y) dx dy in the continuous case 
-oo J —oo 

For example, if g(X, Y) = X + Y, then, in the continuous case, 

/OO POO 
/ {x+y)f{x,y)dxdy 
-oo J — < 



-oo J— oo 

-OO POO POO POO 



/OO POO POO POO 

I xf{x,y)dxdy+ I I yf(x,y) dx dy 
-oo J —oo J —oo J —oo 



= E[X]+E[Y] 

A similar result can be shown in the discrete case and indeed, for any random variables X 
andr, 

E[X + Y] = E[X] + E[Y] (4.5 .1) 

By repeatedly applying Equation 4.5.1 we can show that the expected value of the sum 
of any number of random variables equals the sum of their individual expectations. 
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For instance, 

E[X+Y + Z]=E[{X+Y)+Z] 

= E[X + Y] + E[Z] by Equation 4.5.1 

= E[X] + E[Y] + E[Z] again by Equation 4.5. 1 

And in general, for any n, 

E[Xi + X 2 ■ ■ ■ + X„] = E[Xi\ + E[X 2 ] + ■■■+ E[X n ] (4.5.2) 

Equation 4.5.2 is an extremely useful formula whose utility will now be illustrated by 
a series of examples. 

EXAMPLE 4.5e A construction firm has recently sent in bids for 3 jobs worth (in profits) 
10, 20, and 40 (thousand) dollars. If its probabilities of winning the jobs are respectively 
.2, .8, and .3, what is the firm's expected total profit? 

SOLUTION Letting Xj, i = 1, 2, 3 denote the firm's profit from job z, then 

total profit = X\ + X 2 + X3 



and ; 



Now 



E [total profit] =E[X{\+ E[X 2 ] + E[X 5 ] 



E[X{\ = 10(.2) + 0(.8) = 2 
E[X 2 ] = 20(.8) + 0(.2) = 16 
E[X 3 ] = 40(3) + 0(.7) = 12 

and thus the firm's expected total profit is 30 thousand dollars. ■ 

EXAMPLE 4.5f A secretary has typed Af letters along with their respective envelopes. The 
envelopes get mixed up when they fall on the floor. If the letters are placed in the mixed- up 
envelopes in a completely random manner (that is, each letter is equally likely to end up 
in any of the envelopes), what is the expected number of letters that are placed in the 
correct envelopes? 

SOLUTION Letting X denote the number of letters that are placed in the correct envelope, 
we can most easily compute E[X] by noting that 

X=X X +X 2 + ---+X N 
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where 

II if the ith letter is placed in its proper envelope 
otherwise 

Now, since the nh letter is equally likely to be put in any of the N envelopes, it follows 
that 

P{Xj = 1} = P{ith letter is in its proper envelope} = \IN 

and so 

E[Xi\ = \P{Xi = 1} + 0P{Xi = 0} = UN 

Hence, from Equation 4.5.2 we obtain that 

E[X] = E{Xx\ + ■■■+ E[X N ] = (± J N = 1 

Hence, no matter how many letters there are, on the average, exactly one of the letters will 
be in its own envelope. I 

EXAMPLE 4.5g Suppose there are 20 different types of coupons and suppose that each time 
one obtains a coupon it is equally likely to be any one of the types. Compute the expected 
number of different types that are contained in a set for 10 coupons. 

SOLUTION Let X denote the number of different types in the set of 10 coupons. We 
compute E[X] by using the representation 



vhere 



Nov 



X = X\ + • • • + X20 



I 1 if at least one type i coupon is contained in the set of 10 
X% = \ 

otherwise 



E[X;] = P{X, = 1} 

= P{ax least one type i coupon is in the set of 10} 

= 1 — P{no type i coupons are contained in the set of 10} 

= 1 - (i)'° 
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when the last equality follows since each of the 1 coupons will (independently) not be 
a type i with probability jg. Hence, 



E\X] = E[X{\ + ■■■+ E[X 20 ] = 20 



(S) 10 ] 



1- « ' =8.025 



An important property of the mean arises when one must predict the value of a random 
variable. That is, suppose that the value of a random variable X is to be predicted. If 
we predict that X will equal c, then the square of the "error" involved will be {X — c) . 
We will now show that the average squared error is minimized when we predict that X 
will equal its mean fi. To see this, note that for any constant c 

E[(X - cf] = E[(X -n + fi- c) 2 ] 

= E[(X - (if + 20* - c){X - (i) + (ji - c) 2 ] 

= E[(X - m) 2 ] + 2(ji - c)E[X - ii\ + fa - c) 2 

= E[{X - fi) 2 ] + (n- c 2 ) since E[X - /x] = E[X] - fi = 

> E[(X - fi) 2 ] 

Hence, the best predictor of a random variable, in terms of minimizing its mean square 
error, is just its mean. 

4.6 VARIANCE 

Given a random variable X along with its probability distribution function, it would be 
extremely useful if we were able to summarize the essential properties of the mass function 
by certain suitably defined measures. One such measure would be E[X], the expected value 
olX. However, while E[X] yields the weighted average of the possible values o£X, it does 
not tell us anything about the variation, or spread, of these values. For instance, while the 
following random variables W, Y, and Shaving probability mass functions determined by 



W = with probability 1 
Y = 



1 with probability - 
1 with probability - 



Z 



TOO with probability \ 
100 with probability j 



all have the same expectation — namely, — there is much greater spread in the possible 
values of Y than in those of W (which is a constant) and in the possible values of Z than 
in those of Y . 
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Because we expect X to take on values around its mean E[X], it would appear that 
a reasonable way of measuring the possible variation of X would be to look at how far 
apart X would be from its mean on the average. One possible way to measure this would 
be to consider the quantity E[\X — /x|], where fi = E[X], and [X — fi] represents the 
absolute value ofX — fi. However, it turns out to be mathematically inconvenient to deal 
with this quantity and so a more tractable quantity is usually considered — namely, the 
expectation of the square of the difference between X and its mean. We thus have the 
following definition. 

Definition 

If X is a random variable with mean jjl, then the variance of X, denoted by Var(X), is 
defined by 

Var(JO = E[{X - /z) 2 ] 

An alternative formula for Var(X) can be derived as follows: 

Var(JO = E[(X - /x) 2 ] 

= E[X 2 - 2[iX + ai 2 ] 
= E[X 2 ] - E[2[iX] + E[fi 2 ] 
= E[X 2 ] - 2fiE[X] + ix 2 
= E[X 2 ] - fi 2 



That is, 



Var(X) = E[X 2 ] - (E[X]) 2 (4.6.1) 



or, in words, the variance of X is equal to the expected value of the square of X minus the 
square of the expected value of X. This is, in practice, often the easiest way to compute 
VaxiX). 

EXAMPLE 4.6a Compute Var(JQ whenX represents the outcome when we roll a fair die. 
SOLUTION Since P{X = i} = \,i= 1, 2, 3, 4, 5, 6, we obtain 

6 
E[X 2 ] = J2 i 2 P{X = i] 
i-\ 

= l 2 (I)+2 2 (I) + 3 2 (i) + 4 2 (I) + 5 2 (I) + 6 2 (I) 

— 91 

— 6 
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Hence, since it was shown in Example 4.4a that E[X] = C, we obtain from Equation 
4.6.1 that 

Var(X) = E[X 2 ] - (E[X]) 2 

_ 91 _ (7\ 2 _ 35 u 
— 6 \2) ~ 12 " 

EXAMPLE 4.6b Variance of an Indicator Random Variable. If, for some events, 

II if event A occurs 
if event A does not occur 

then 

Var(7) = E[I 2 ] - (E[I]f 

= E[I] - (E[I]) 2 since I 2 = /(as l 2 = 1 and 2 = 0) 

= E[I](l-E[I]) 

= P(A)[l - P{A)] since E[I] = P{A) from Example 4.4b ■ 

A useful identity concerning variances is that for any constants a and b, 

Vai{aX + b)= a 2 Vnv{X) (4.6.2) 

To prove Equation 4.6.2, let /x = E[X] and recall that E[aX + b\ = a/J, + b. Thus, by 
the definition of variance, we have 

Var(tfX + b)= E[{aX + b- E[aX + b]) 2 ] 

= E[(aX + b- aix- bf] 

= E[(aX — ayi) ] 

= E[a 2 {X - /x) 2 ] 

= a 2 E[{X - /x) 2 ] 

= ^Var(X) 

Specifying particular values for a and b in Equation 4.6.2 leads to some interesting 
corollaries. For instance, by setting a = in Equation 4.6.2 we obtain that 

Var(£) = 

That is, the variance of a constant is 0. (Is this intuitive?) Similarly, by setting a = 1 we 
obtain 

Var(X + b)= Var(X) 
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That is, the variance of a constant plus a random variable is equal to the variance of the 
random variable. (Is this intuitive? Think about it.) Finally, setting b = yields 

Vai{aX) = a 2 Ynv{X) 

The quantity VVar(X) is called the standard deviation of X. The standard deviation 
has the same units as does the mean. 

REMARK 

Analogous to the mean's being the center of gravity of a distribution of mass, the variance 
represents, in the terminology of mechanics, the moment of inertia. 

4.7 COVARIANCE AND VARIANCE OF SUMS OF 
RANDOM VARIABLES 

We showed in Section 4.5 that the expectation of a sum of random variables is equal to 
the sum of their expectations. The corresponding result for variances is, however, not 
generally valid. Consider 

VaiiX + X) =Vai{2X) 
= 2 2 Var(JO 
= 4 Var(X) 
^ Var(X) + Var(X) 

There is, however, an important case in which the variance of a sum of random vari- 
ables is equal to the sum of the variances; and this is when the random variables are 
independent. Before proving this, however, let us define the concept of the covariance of 
two random variables. 



Definition 

The covariance of two random variables X and Y, written Cov(X, Y) is defined by 

Cov(X, Y) = E[(X - ijl x )(Y - fi y )] 

where [i x and fly are the means of X and Y, respectively. 

A useful expression for CovQC, Y) can be obtained by expanding the right side of the 
definition. This yields 

Cov(X Y) = E\XY -fi x Y-fi y X + fi x iAy] 

= E[XY] - fi x E[Y] - fi y E[X] + fi x iXy 
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= E[XY] - fM x fl y - fly/J, X + [l x [l y 

= E[XY] - E[X]E[Y] (4.7 A) 

From its definition we see that covariance satisfies the following properties: 

Cov(X, Y) = Cov(Y,X) (4.7.2) 

and 

Cov{X,X)=Var{X) (4.7.3) 

Another property of covariance, which immediately follows from its definition, is that, for 
any constant a, 

Cov{aX, Y)=a Cov(X, Y) {4.7.4) 

The proof of Equation 4.7.4 is left as an exercise. 

Covariance, like expectation, possesses an additive property. 

Lemma 4.7.1 

Cov(X +Z,Y) = CovLY, Y) + Cov(Z, Y) 

Proof 

Cov{X+Z, Y) 

= E[{X+Z)Y]- E[X+Z]E[Y] from Equation 4.7.1 
= E[XY] + E[ZY] - {E[X] + E[Z])E[Y] 
= E[XY] - E[X]E[Y] + E[ZY] - E[Z]E[Y] 
= Cov(X, Y) + Cov(Z, Y) D 

Lemma 4.7.1 can be easily generalized (see Problem 48) to show that 

/ n \ n 

cov ( J2 x i> Y ) = J2 Cov{x " Y) (4 - 7 - 5) 



;'=! 



which gives rise to the following. 
PROPOSITION 4.7.2 



cov ( Y,Xi> E y j <) = E E Cov «' ^ 

i=l 7 =1 / i=lj=l 
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Proof 

(n m 

,=i J= i , 

( 

= 2^ Cov I Xi, 2_\ Yj I from Equation 4.7.5 

n I m \ 

= 2^ Cov I 2^ Yj,Xi I by the symmetry property Equation 4.7.2 

n m 

= y^ y. Cov(Yj,Xj) again from Equation 4.7.5 
i=\ ;=1 

and the result now follows by again applying the property Equation 4.7.2. □ 

Using Equation 4.7.3 gives rise to the following formula for the variance of a sum of 
random variables. 

Corollary 4.7.3 

/ n \ n n n 

Var V£Xi I = E Var ^ + E E CovK,^) 



i=\ 7=1 



Proof 



The proof follows directly from Proposition 4.7.2 upon setting m = n, and Yj = Xj for 
j = l,...,n. □ 

In the case of n = 2, Corollary 4.7.3 yields that 

Var(X +Y)= Var(X) + Var(F) + Cov(X, Y) + Cov{Y,X) 

or, using Equation 4.7.2, 

Var(X +Y) = Var(X) + Var(F) + 2Cov(X, Y) (4.7.6) 

Theorem 4.7.4 

If X and Y are independent random variables, then 

Cov(X, Y) = 
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and so for independent X\ , . . . ,X n , 

(n \ n 

J2 X ) = X! Var tf3 
i=\ I i=\ 

Proof 

We need to prove that i?LYF] = i?LY]i?[F]. Now, in the discrete case, 

e[xy] = J2Y1 x >yA x = **•• Y = yrf 
j > 

= y^ 2_, Xi)/jP{X = Xi}P{Y = jj\ by independence 

j ' 

= &W =yj}J2 x ' p ^ x = x * 

y i 

= E[Y]E[X] 

Because a similar argument holds in all other cases, the result is proven. □ 

EXAMPLE 4.7a Compute the variance of the sum obtained when 10 independent rolls of 
a fair die are made. 

SOLUTION Letting Xj denote the outcome of the z'th roll, we have that 

(10 \ 10 

J2 X i) = Z)Var(Xi) 

= 10y| from Example 4.6a 

= l -¥ ■ 

EXAMPLE 4.7b Compute the variance of the number of heads resulting from 10 indepen- 
dent tosses of a fair coin. 

SOLUTION Letting 

1 if the jth toss lands heads 



L = , 

10 if the y'th toss lands tails 
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then the total number of heads is equal to 

10 

7=1 

Hence, from Theorem 4.7.4, 

(10 \ 10 

7=1 / 7=1 

Now, since Ij is an indicator random variable for an event having probability ~ , it follows 
from Example 4.6b that 

Varty) = £ (1 - |) = \ 
and thus 

MS'H ■ 

The covariance of two random variables is important as an indicator of the relationship 
between them. For instance, consider the situation where X and Y are indicator variables 
for whether or not the events A and B occur. That is, for events A and B, define 



X = 



1 if A occurs 1 if B occurs 

, ■ ■ Y =\ , ■ 

otherwise otherwise 



and note that 

' 1 if X = 1, Y = 1 



XY = 

otherwise 



Thus, 



Cov(X Y) = E[XY] - E\X]E[Y] 

= P{X = 1, Y = 1} - P{X = \}P{Y = 1} 

From this we see that 

Cov(X, Y) > ^ P{X = 1, Y = 1} > P{X = l}P{r = 1) 
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P{X = 1, Y = 1} 

O — > P{Y = 1} 

P{X =1} l ' 

& P{Y = \\X = 1} > P{Y = 1} 

That is, the covariance of X and Y is positive if the outcome X = 1 makes it more likely 
that Y — 1 (which, as is easily seen by symmetry, also implies the reverse) . 

In general, it can be shown that a positive value of Cov(X, Y) is an indication that 
Y tends to increase as X does, whereas a negative value indicates that Y tends to decrease 
as X increases. The strength of the relationship between X and Y is indicated by the 
correlation between X and Y, a dimensionless quantity obtained by dividing the covari- 
ance by the product of the standard deviations ofX and Y. That is, 

Cov(X, Y) 
Corr (X, Y) = = 

VVar(X)Var(K) 

It can be shown (see Problem 49) that this quantity always has a value between — 1 and + 1 . 

4.8 MOMENT GENERATING FUNCTIONS 

The moment generating function (p(t) of the random variable X is defined for all values 
t by 



<P{t) = E[e tX ] = 



y^ e tx p(x) if X is discrete 

X 

/oo 
e tx f{. 
-oo 



oo 

x) dx ilX is continuous 

oo 



We call 4>{t) the moment generating function because all of the moments of X can be 
obtained by successively differentiating 4>{t). For example, 

at 



= E 



dt 



E[Xe tX ] 



He 



Similarly, 



4>'(0) = E[X] 



0"W = U\t) 

dt 

= -E[Xe tX ] 
dt 
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= E 



dt 



= E[X 2 e tX ] 



0"(O) = E[X 2 ] 

In general, the nth derivative of <p{t) evaluated at t = equals E[X"]; that is, 

B (O) = E[X n ], n > 1 

An important property of moment generating functions is that the moment generating 
function of the sum of independent random variables is just the product of the individual 
moment generating functions. To see this, suppose thatX and Y are independent and have 
moment generating functions <fix(t) and </>rM> respectively. Then <j>x+Y{t), the moment 
generating function of X + Y, is given by 



4> X+Y (t) = E[e« X+Y h 
= E[e tX e tY ] 
= E[e tX ]E[e lY ] 

= <Px(t)<p Y (t) 

where the next to the last equality follows from Theorem 4.7.4 since X and Y, and thus 
e and e , are independent. 

Another important result is that the moment generating function uniquely determines 
the distribution. That is, there exists a one-to-one correspondence between the moment 
generating function and the distribution function of a random variable. 

4.9 CHEBYSHEV'S INEQUALITY AND THE WEAK LAW OF 
LARGE NUMBERS 

We start this section by proving a result known as Markov's inequality. 

PROPOSITION 4.9.1 MARKOV'S INEQUALITY 

If X is a random variable that takes only nonnegative values, then for any value a > 
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Proof 

We give a proof for the case where X is continuous with density/. 

/■OO 

E[X] = / xf{x) dx 

= / xf{x) dx + / xf(x) dx 

J0 Ja 

xf(x) dx 

/OO 
af(x) dx 

/OO 
f(x) dx 

= aP{X > a] 

and the result is proved. □ 

As a corollary, we obtain Proposition 4.9.2. 

PROPOSITION 4.9.2 CHEBYSHEV'S INEQUALITY 

If X is a random variable with mean fx and variance a , then for any value k > 

a 2 
P[\X - n\ > k\ < -£ 

Proof 

Since (X — fi) 2 is a nonnegative random variable, we can apply Markov's inequality 
(with a = k ) to obtain 

P{(X - /x) 2 > k 2 } < p W (4.9.1) 

But since (X — /x) > k if and only if \X — /x| > k, Equation 4.9.1 is equivalent to 

E[(X - fi) 2 ] a 2 

P{\X-*\>*}< k 2 =p 

and the proof is complete. D 

The importance of Markov's and Cheybyshev's inequalities is that they enable us to 
derive bounds on probabilities when only the mean, or both the mean and the variance, of 
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the probability distribution are known. Of course, if the actual distribution were known, 
then the desired probabilities could be exactly computed and we would not need to resort 
to bounds. 

EXAMPLE 4.9a Suppose that it is known that the number of items produced in a factory 
during a week is a random variable with mean 50. 

(a) What can be said about the probability that this week's production will exceed 75? 

(b) If the variance of a week's production is known to equal 25, then what can be said 
about the probability that this week's production will be between 40 and 60? 

SOLUTION Let X be the number of items that will be produced in a week: 

(a) By Markov's inequality 

E[X] 50 2 
P{X > 75} < -V = — = - 

75 75 3 

(b) By Chebyshev's inequality 

7>{|X-50|>10}<^ = 1 

Hence 

P{\X-50\ < 10} > 1 - i = | 

4 4 

and so the probability that this week's production will be between 40 and 60 is at 
least .75. ■ 

By replacing k by ka in Equation 4.9.1, we can write Chebyshev's inequality as 

P{\X - n\ > ka] < l/k 2 

Thus it states that the probability a random variable differs from its mean by more than k 
standard deviations is bounded by l/k . 

We will end this section by using Chebyshev's inequality to prove the weak law of large 
numbers, which states that the probability that the average of the first n terms in a sequence 
of independent and identically distributed random variables differs by its mean by more 
than e goes to as n goes to infinity. 

Theorem 4.9.3 The Weak Law of Large Numbers 

Let X\, X%, . . . , be a sequence of independent and identically distributed random variables, 
each having mean i?[JQ] = /x. Then, for any e > 0, 



P 



Xx + ---+X„ 



> £f ^ U as « ^ oo 
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Proof 

We shall prove the result only under the additional assumption that the random variables 
have a finite variance a 2 . Now, as 



X x + ---+X„ 



= fi and Var 



Xx + ---+X n 



n 



it follows from Chebyshev's inequality that 

Xi+---+X» 



P 



li 



>€ < 



ne A 



and the result is proved. □ 

For an application of the above, suppose that a sequence of independent trials is 
performed. Let E be a fixed event and denote by P(E) the probability that E occurs 
on a given trial. Letting 



X = 



1 if E occurs on trial i 

if E does not occur on trial i 



it follows thatXi + Xi + • • • +X n represents the number of times that E occurs in the first 
n trials. Because E[Xf\ = P(E), it thus follows from the weak law of large numbers that 
for any positive number e, no matter how small, the probability that the proportion of the 
first n trials in which E occurs differs from P(E) by more than e goes to as n increases. 



Problems 



1. Five men and 5 women are ranked according to their scores on an examination. 
Assume that no two scores are alike and all 10! possible rankings are equally likely. 
Let X denote the highest ranking achieved by a woman (for instance, X = 2 if 
the top-ranked person was male and the next- ranked person was female). Find 
P{X = i}J= 1,2, 3,..., 8, 9, 10. 

2. Let X represent the difference between the number of heads and the number of 
tails obtained when a coin is tossed n times. What are the possible values of X? 

3. In Problem 2, if the coin is assumed fair, for n = 3, what are the probabilities 
associated with the values that X can take on? 
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4. The distribution function of the random variable X is given 



F{x) = 




x 

2 

2 

3 

11 

n 



x < 

< x < 1 

1 < x < 2 

2 <x < 3 

3 < x 



(a) Plot this distribution function. 

(b) Whatis/>{X> I}? 

(c) What is P{2 <X<4}? 

(d) What is P{X < 3}? 

(e) Whati Sj P{X= 1}? 

Suppose you are given the distribution function F of a random variable^. Explain 
how you could determine P{X =1}. {Hint: You will need to use the concept of 
a limit.) 

The amount of time, in hours, that a computer functions before breaking down is 
a continuous random variable with probability density function given by 



/(*) = 



Xe- xll0 ° x 







_ 
x < 



What is the probability that a computer will function between 50 and 150 hours 
before breaking down? What is the probability that it will function less than 
100 hours? 

7. The lifetime in hours of a certain kind of radio tube is a random variable having 
a probability density function given by 

fO x< 100 

fix) = \ 100 

J I — - x > 100 



What is the probability that exactly 2 of 5 such tubes in a radio set will have 
to be replaced within the first 1 50 hours of operation? Assume that the events 
Ef, i = 1,2,3,4, 5, that the z'th such tube will have to be replaced within this 
time are independent. 

8. If the density function of X equals 



/(*) = 



2x < x < oo 
x < 



find c. What is P{X > 2}? 
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9. A bin of 5 transistors is known to contain 3 that are defective. The transistors are 
to be tested, one at a time, until the defective ones are identified. Denote by N\ 
the number of tests made until the first defective is spotted and by N% the number 
of additional tests until the second defective is spotted; find the joint probability 
mass function of N\ and Nj- 

10. The joint probability density function of X and Y is given by 

f(x,y) = - ( x 2 + — J , < x < 1, <y < 2 

(a) Verify that this is indeed a joint density function. 

(b) Compute the density function of X. 

(c) Ym&P{X> Y}. 

11. LetXj,^, . . . ,X n be independent random variables, each having a uniform distri- 
bution over (0, 1). LetM = maximum (X\,X2, . . . ,X„). Show that the distribution 
function of M, Fm(-), is given by 



F M {x) 



< x < 1 



What is the probability density function of M ? 



12. The joint density of X and Y is given by 

\ xe (-x+y) 



f{x,y) = 

(a) Compute the density of X. 

(b) Compute the density of Y. 

(c) Are X and Y independent? 

13. The joint density of X and Y is 

f{x,y) = 







x > 0,y > 
otherwise 



2 < x < y, < y < I 
otherwise 



(a) Compute the density of X. 

(b) Compute the density of Y. 

(c) Are X and Y independent? 

14. If the joint density function of X and Y factors into one part depending only on 
x and one depending only on y, show that X and Y are independent. That is, if 



f(x,y) = k(x)l(y), —00 < x < 00, 
show that X and Y are independent. 



-00 < v < 00 
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15. Is Problem 14 consistent with the results of Problems 12 and 13? 

16. Suppose thatX and Y are independent continuous random variables. Show that 

/oo 
F x {a-y)f Y {y)dy 
-oo 
/oo 
-oo 



(b) 



where _/y is the density function of Y, and i*^ is the distribution function 
ofX. 

17. When a current / (measured in amperes) flows through a resistance R (measured 
in ohms), the power generated (measured in watts) is given by W = I 2 R. Suppose 
that / and R are independent random variables with densities 



fi{x) = 6x(l - x) < x < 1 
fn(x) = 2x < x < 1 



Determine the density function of W. 

18. In Example 4.3b, determine the conditional probability mass function of the size 
of a randomly chosen family containing 2 girls. 

19. Compute the conditional density function of X given Y = y in (a) Problem 10 
and (b) Problem 13. 

20. Show that X and Y are independent if and only if 

(a) PxiY - — px{x) in the discrete case 

(b) fxiY J —fx(x) in the continuous case 

21. Compute the expected value of the random variable in Problem 1. 

22. Compute the expected value of the random variable in Problem 3. 

23. Each night different meteorologists give us the "probability" that it will rain the 
next day. To judge how well these people predict, we will score each of them as 
follows: If a meteorologist says that it will rain with probability/*, then he or she 
will receive a score of 



1 — (1 — p) if it does rai 



rain 



1 — p if it does not rain 

We will then keep track of scores over a certain time span and conclude that 
the meteorologist with the highest average score is the best predictor of weather. 
Suppose now that a given meteorologist is aware of this and so wants to maximize 
his or her expected score. If this individual truly believes that it will rain tomorrow 
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with probability/**, what value of/> should he or she assert so as to maximize the 
expected score? 

24. An insurance company writes a policy to the effect that an amount of money A 
must be paid if some event E occurs within a year. If the company estimates that 
E will occur within a year with probability/), what should it charge the customer 
so that its expected profit will be 10 percent ofv4? 

25. A total of 4 buses carrying 148 students from the same school arrive at a football 
stadium. The buses carry, respectively, 40, 33, 25, and 50 students. One of the 
students is randomly selected. Let X denote the number of students that were on 
the bus carrying this randomly selected student. One of the 4 bus drivers is also 
randomly selected. Let Y denote the number of students on her bus. 

(a) Which of E[X] or E[Y] do you think is larger? Why? 

(b) Compute E [X] and E[Y]. 

26. Suppose that two teams play a series of games that end when one of them has won 
i games. Suppose that each game played is, independently, won by team A with 
probability p. Find the expected number of games that are played when i = 2. 
Also show that this number is maximized when/) = ~- 

27. The density function of X is given by 

I a + bx 2 < x < 1 
U 7 
otherwise 

IfE[X] = |,find*,*. 

28. The lifetime in hours of electronic tubes is a random variable having a probability 
density function given by 

f( x ) = a 2 xe~ ax , x > 

Compute the expected lifetime of such a tube. 

29. Let X\, Xi, . . . , X n be independent random variables having the common density 
function 

II < x < 1 
otherwise 

Find (a) E[Max{X h . . . ,X„)] and (b) ^[Min^, . . . ,X n )]. 

30. Suppose that X has density function 

fl <x < 1 

otherwise 
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Compute i?[X H ] (a) by computing the density of X„ and then using the definition 
of expectation and (b) by using Proposition 4.5.1. 

31. The time it takes to repair a personal computer is a random variable whose density, 
in hours, is given by 



I i <x < 2 
otherwise 



The cost of the repair depends on the time it takes and is equal to 40 + 30^/x 
when the time is x. Compute the expected cost to repair a personal computer. 

32. IfE[X] = 2andE[X 2 ] = 8, calculate (a) E[{2+4X) 2 )] and (b) E[X 2 +{X+l) 2 ]. 

33. Ten balls are randomly chosen from an urn containing 17 white and 23 black 
balls. LetX denote the number of white balls chosen. Compute E[X] 

(a) by defining appropriate indicator variables Xi, i = 1, . . . , 10 so that 

10 

i=i 

(b) by defining appropriate indicator variables Yj,= 1, . . . , 17 so that 

17 
* = £> 

i=\ 

34. If X is a continuous random variable having distribution function E, then its 
median is defined as that value of m for which 

F(m) = 1/2 

Find the median of the random variables with density function 

(a) /(*) = e~ x , x > 0; 

(b) f(x) = 1, < x < 1. 

35. The median, like the mean, is important in predicting the value of a random 
variable. Whereas it was shown in the text that the mean of a random variable 
is the best predictor from the point of view of minimizing the expected value of 
the square of the error, the median is the best predictor if one wants to minimize 
the expected value of the absolute error. That is, E[\X — c\] is minimized when 
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c is the median of the distribution function of X. Prove this result when X is 
continuous with distribution function F and density function/ - . {Hint: Write 

/oo 
\x-c\f(x)dx 
-oo 

/c /»00 

\x — c\f(x) dx + / \x — c\f{x)dx 
-oo J c 

/C POO 

{c — x)f{x) dx + / (x — c)f{x) dx 
-OO ./£■ 

/C POO 

xf{x) dx + / xf (*) dx - c[\ - F{c)] 
-oo J c 

Now, use calculus to find the minimizing value of c.) 

36. We say that trip is the WOp percentile of the distribution function F if 

F(m p ) =p 
Find w* for the distribution having density function 

f(x) = 2e~ 2x , x>0 

37. A community consists of 100 married couples. If during a given year 50 of the 
members of the community die, what is the expected number of marriages that 
remain intact? Assume that the set of people who die is equally likely to be any of 

the ( 5 ) groups of size 50. (Hint: For i = 1, . . . , 100 let 

II if neither member of couple i dies 
otherwise 

38. Compute the expectation and variance of the number of successes in n indepen- 
dent trials, each of which results in a success with probability/*. Is independence 
necessary? 

39. Suppose that X is equally likely to take on any of the values 1, 2, 3, 4. Compute 
(a) F[X] and (b) Var(X). 

40. Let pi = P{X = i] and suppose that pi -Vp% + p$ = 1. If F[X] = 2, what values 
o£pi,p2,p3 W maximize and (b) minimize Var(X)? 

41. Compute the mean and variance of the number of heads that appear in 3 flips of 
a fair coin. 
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42. Argue that for any random variable X 

E[X 2 ] > (E[X]f 

When does one have equality? 

43. A random variable X, which represents the weight (in ounces) of an article, has 
density function given by f{z), 

{z - 8) for 8 < z < 9 
f(z) = ■ (10 - z) for 9 < z < 10 
otherwise 

(a) Calculate the mean and variance of the random variable X. 

(b) The manufacturer sells the article for a fixed price of $2.00. He guarantees 
to refund the purchase money to any customer who finds the weight of his 
article to be less than 8.25 oz. His cost of production is related to the weight 
of the article by the relation x/15 + .35. Find the expected profit per article. 

44. Suppose that the Rockwell hardness X and abrasion loss Y of a specimen (coded 
data) have a joint density given by 

Iu + v for < u, v < 1 
u 
otherwise 

(a) Find the marginal densities of X and Y. 

(b) Find E(X) and Var(X). 

45. A product is classified according to the number of defects it contains and the 
factory that produces it. Let X\ and X2 be the random variables that represent 
the number of defects per unit (taking on possible values of 0, 1, 2, or 3) and the 
factory number (taking on possible values 1 or 2), respectively. The entries in the 
table represent the joint possibility mass function of a randomly chosen product. 





X, 






X^ 




1 


2 







1 


1 




s 


16 


1 




1 


1 




16 


16 


2 




3 


1 




16 


8 


3 




1 

S 


1 

4 



(a) Find the marginal probability distributions of X\ and X2. 

(b) Find E\iX x )\, E[(X 2 )], VarLYi), Var(X 2 ), and Cov(X x ,X 2 ). 
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46. A machine makes a product that is screened (inspected 100 percent) before being 
shipped. The measuring instrument is such that it is difficult to read between 1 and 
1 5 (coded data). After the screening process takes place, the measured dimension 



has density 



/(*) = 



kz 2 for < z < 1 
1 for 1 <z < \\ 
otherwise 



(a) Find the value of k. 

(b) What fraction of the items will fall outside the twilight zone (fall between 
and 1)? 

(c) Find the mean and variance of this random variable. 

47. Verify Equation 4.7.4. 

48. Prove Equation 4.7.5 by using mathematical induction. 

49. Let X have variance a 2 and let Y have variance o . Starting with 

< Var{X/a x + Yla y ) 
show that 

-1 <Corr(X, Y) 



Now i 



that 



conclude that 



< Nzx{Xla x - Ylo y 



•1 < Corr(X, Y) < 1 



Using the result that Var(Z) = implies that Z is constant, argue that if 
Corr(X, Y) — 1 or — 1 then X and Y are related by 

Y = a + bx 



where the sign of b is positive when the correlation is 1 and negative when it 
is — 1. 

50. Consider n independent trials, each of which results in any of the outcomes i, i = 
1,2,3, with respective probabilities p \, pi , ps >X);=i Pi = 1- Let N, denote the 
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number of trials that result in outcome i, and show that Cov(N\, N 2 ) = —npipi- 
Also explain why it is intuitive that this covariance is negative. {Hint. For i = 
1, ...,», let 



I 1 if trial i results in outcome 1 
X, >. = \ 

if trial i does not result in outcome 1 



Similarly, forj' = 1, ...,«, let 

1 if trialy results in outcome 2 



Yj- . 

10 if trial j does not result in outcome 2 



Argue that 



2=1 7=1 

Then use Proposition 4.7.2 and Theorem 4.7.4.) 

51. In Example 4. 5f compute Cov(Xi,Xj) and use this result to show that Var(X) = 1. 

52. If X\ and X2 have the same probability distribution function, show that 

Cov(X l -X 2 ,X l +X 2 ) = 

Note that independence is not being assumed. 

53. Suppose that X has density function 

f( x ) = e ~ x , x > 

Compute the moment generating function of X and use your result to determine 
its mean and variance. Check your answer for the mean by a direct calculation. 

54. If the density function of X is 

f(x) = 1, < x < 1 

determine E[e ]. Differentiate to obtain E[X n ] and then check your answer. 

55. Suppose that X is a random variable with mean and variance both equal to 20. 
What can be said about P{0 <X < 40}? 
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56. From past experience, a professor knows that the test score of a student taking her 
final examination is a random variable with mean 75. 

(a) Give an upper bound to the probability that a student's test score will 
exceed 85. 

Suppose in addition the professor knows that the variance of a student's test 
score is equal to 25. 

(b) What can be said about the probability that a student will score between 
65 and 85? 

(c) How many students would have to take the examination so as to ensure, with 
probability at least .9, that the class average would be within 5 of 75? 

57. Let X and Fhave respective distribution functions Fx and Fy, and suppose that 
for some constants a and b > 0, 

I x — a 
Fx(x) = Fy 



(a) Determine F[X] in terms of i?[F]. 

(b) Determine Var(X) in terms of Var(K). 

Hint. Xhas the same distribution as what other random variable? 




SPECIAL RANDOM VARIABLES 



Certain types of random variables occur over and over again in applications. In this chapter, 
we will study a variety of them. 



5.1 THE BERNOULLI AND BINOMIAL 
RANDOM VARIABLES 

Suppose that a trial, or an experiment, whose outcome can be classified as either a "success" 
or as a "failure" is performed. If we let X = 1 when the outcome is a success and X = 
when it is a failure, then the probability mass function of X is given by 

P{X = 0} = l-p (5.1.1) 

P{X=\}=p 

where/>, <p < 1, is the probability that the trial is a "success." 

A random variable X is said to be a Bernoulli random variable (after the Swiss mathe- 
matician James Bernoulli) if its probability mass function is given by Equations 5.1.1 for 
some/> £ (0, 1). Its expected value is 

E[X] = 1 • P{X = 1} + • P{X = 0}=p 

That is, the expectation of a Bernoulli random variable is the probability that the random 
variable equals 1. 

Suppose now that n independent trials, each of which results in a "success" with prob- 
ability/) and in a "failure" with probability 1 — p, are to be performed. If X represents 
the number of successes that occur in the n trials, then X is said to be a binomial random 
variable with parameters («, p). 
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The probability mass function of a binomial random variable with parameters n and/> 
is given by 

P{X = i}=r\p'{l-p) n -', i = 0,l,-..,n (5.1.2) 

where (") = »!/[*'!(» — z)!] is the number of different groups of i objects that can be 
chosen from a set of n objects. The validity of Equation 5.1.2 may be verified by first 
noting that the probability of any particular sequence of the n outcomes containing i 
successes and n — i failures is, by the assumed independence of trials, p'{\ — p)"~'. 
Equation 5.1.2 then follows since there are (") different sequences of the n outcomes 
leading to i successes and n — i failures — which can perhaps most easily be seen by 
noting that there are {"■) different selections of the i trials that result in successes. For 
instance, if n = 5, i = 2, then there are (,) choices of the two trials that are to result in 
successes — namely, any of the outcomes 

(s,s,f,f,f) (f,s,s,f,f) (f,f,s,f,s) 

(s,f,s,f,f) (f,s,f,s,f) 

(s,f,f,s,f) (f,s,f,f,s) (f,f,f,s,s) 

(s,f,f,f,s) (f,f,s,s,f) 

where the outcome (f, s,f, s,f) means, for instance, that the two successes appeared on 
trials 2 and 4. Since each of the ( 2 ) outcomes has probability p 2 {\ — />) 3 , we see that the 
probability of a total of 2 successes in 5 independent trials is ( 2 )p 0- ~ p) ■ Note that, by 
the binomial theorem, the probabilities sum to 1, that is, 

oo n 

Z>0 = E O pt{l ~ pT " = [p+{l ~ p)f = l 

i=0 j=0 

The probability mass function of three binomial random variables with respective param- 
eters (10, .5), (10, .3), and (10, .6) are presented in Figure 5.1. The first of these is 
symmetric about the value .5, whereas the second is somewhat weighted, or skewed, to 
lower values and the third to higher values. 

EXAMPLE 5.1a It is known that disks produced by a certain company will be defective 
with probability .0 1 independently of each other. The company sells the disks in packages 
of 10 and offers a money-back guarantee that at most 1 of the 10 disks is defective. 
What proportion of packages is returned? If someone buys three packages, what is the 
probability that exactly one of them will be returned? 

SOLUTION If X is the number of defective disks in a package, then assuming that customers 
always take advantage of the guarantee, it follows that X is a binomial random variable 
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FIGURE 5. 1 Binomial probability mass functions. 



with parameters (10, .01). Hence the probability that a package will have to be replaced is 
P{X > 1} = 1 - P{X = 0} - P{X = 1} 

- 1 - I 10 ) (.01)°(.99) 10 - (j (.Ol) 1 ^) 9 % .005 
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Because each package will, independently, have to be replaced with probability .005, it 
follows from the law of large numbers that in the long run .5 percent of the packages will 
have to be replaced. 

It follows from the foregoing that the number of packages that the person will have to 
return is a binomial random variable with parameters n = 3 and/) = .005. Therefore, the 

probability that exactly one of the three packages will be returned is I l J (.005)(.995) = 
.015. ■ 

EXAMPLE 5. 1 b The color of one's eyes is determined by a single pair of genes, with the gene 
for brown eyes being dominant over the one for blue eyes. This means that an individual 
having two blue-eyed genes will have blue eyes, while one having either two brown-eyed 
genes or one brown-eyed and one blue-eyed gene will have brown eyes. When two people 
mate, the resulting offspring receives one randomly chosen gene from each of its parents' 
gene pair. If the eldest child of a pair of brown-eyed parents has blue eyes, what is the 
probability that exactly two of the four other children (none of whom is a twin) of this 
couple also have blue eyes? 

SOLUTION To begin, note that since the eldest child has blue eyes, it follows that both 
parents must have one blue-eyed and one brown-eyed gene. (For if either had two brown- 
eyed genes, then each child would receive at least one brown-eyed gene and would thus 
have brown eyes.) The probability that an offspring of this couple will have blue eyes is 
equal to the probability that it receives the blue-eyed gene from both parents, which is 
(2X2) = 4" Hence, because each of the other four children will have blue eyes with 
probability \, it follows that the probability that exactly two of them have this eye color is 

)(l/4) 2 (3/4) 2 = 27/128 ■ 

EXAMPLE 5.1c A communications system consists of n components, each of which will, 
independently, function with probability p. The total system will be able to operate 
effectively if at least one-half of its components function. 

(a) For what values of p is a 5-component system more likely to operate effectively 
than a 3-component system? 

(b) In general, when is a 2k + 1 component system better than a 2k — 1 component 
system? 

SOLUTION 

(a) Because the number of functioning components is a binomial random variable 
with parameters (n,p), it follows that the probability that a 5-component system 
will be effective is 

f)/(i -/o 2 +(T)/(i -,)+/ 
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whereas the corresponding probability for a 3-component system is 

Hence, the 5-component system is better if 

10/(1 -pf + 5/(1 -p) +/ > 3/(1 -/») +/ 
which reduces to 

3(/> - \) 2 {2p - 1) > 



1 
/> > - 
^2 

(b) In general, a system with 2k + 1 components will be better than one with 2k — 1 
components if (and only if) /> > 5 • To prove this, consider a system of 2k + 1 
components and let X denote the number of the first 2£ — 1 that function. Then 

P 2 *+i (effective) = P{X >k+\}+P{X = k}{\-{\-p) 2 )+P{X = k-\}p 2 

which follows since the 2k -\- \ component system will be effective if either 

(1) X>k+\; 

(2) X = k and at least one of the remaining 2 components function; or 

(3) X = k — 1 and both of the next 2 function. 
Because 

P 2 k- infective) = P{X > k} 

= P{X = k}+P{X>k+\} 

we obtain that 

Pik+\ (effective) — P2k- 1 (effective) 

= P{X = k - 1}/ - (1 -p) 2 P{X = k] 

2k k :iy- 1 ^ -#/ - u -?? ( 2 V ! )/a -pt- 1 

2k-l\ l l (2k-\\ (2k-\ 

/U -/>)*[/> -(!-/>)] sincef _ i 

1 
r ~ 2 
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EXAMPLE 5. Id Suppose that 10 percent of the chips produced by a computer hardware 
manufacturer are defective. If we order 100 such chips, will X, the number of defective 
ones we receive, be a binomial random variable? 

SOLUTION The random variable X will be a binomial random variable with parameters 
(100, .1) if each chip has probability .9 of being functional and if the functioning of 
successive chips is independent. Whether this is a reasonable assumption when we know 
that 10 percent of the chips produced are defective depends on additional factors. For 
instance, suppose that all the chips produced on a given day are always either functional 
or defective (with 90 percent of the days resulting in functional chips). In this case, if we 
know that all of our 100 chips were manufactured on the same day, then X will not be 
a binomial random variable. This is so since the independence of successive chips is not 
valid. In fact, in this case, we would have 

P{X= 100} = .1 
P{X = 0} = .9 ■ 

Since a binomial random variable X, with parameters n and^, represents the number of 
successes in n independent trials, each having success probability^, we can represent Xas 
follows: 



X = J2 X ' (5.1.3) 



^here 

X, 



1 if the ith trial is a success 
otherwise 



Because the Xi, i = 1, . . . , n are independent Bernoulli random variables, we have that 

E[X i ]=P{X l = \}=p 
Vzr{X 1 )=E[Xf]-p 2 
= p{l-p) 

where the last equality follows since Xf = Xi, and so E[Xf ] = E[Xj] = p. 

Using the representation Equation 5.1.3, it is now an easy matter to compute the mean 
and variance of X. 



evc\ = J2 E[X ^ 



= np 



S.I The Bernoulli and Binomial Random Variables 147 



VarLY) = y^ Va.r(X;) since the Xi are independent 

i=\ 

= np{\ -p) 

If X\ and X2 are independent binomial random variables having respective parameters 
(ni,p), i = 1, 2, then their sum is binomial with parameters {n\ + «2>/>)- This can most 
easily be seen by noting that because Xi, i = 1,2, represents the number of successes in ni 
independent trials each of which is a success with probability/), then X\ + X2 represents 
the number of successes in n\ + »2 independent trials each of which is a success with 
probability/). Therefore, X\ + Xi is binomial with parameters («i + n2,p). 

5.1.1 Computing the Binomial Distribution Function 

Suppose that X is binomial with parameters {n,p). The key to computing its distribution 
function 

P{X <*} = J2 uV (1 ~P) n ~ k > *' = 0, 1, . . . ,» 

is to utilize the following relationship between P{X = k + 1} and P{X = k}: 

t) ft k 

P{X = k+l}= / P{X = k} (5.1.4) 

1 — p k + 1 

The proof of this equation is left as an exercise. 

EXAMPLE 5. 1 e Let X be a binomial random variable with parameters n = 6,/> = .4. Then, 
starting with P{X = 0} = (.6) and recursively employing Equation 5.1.4, we obtain 

P{X = 0} = (.6) 6 = .0467 
P{X = 1} = |fP{X = 0} = .1866 
P{X = 2} = \\P{X = 1} = .3110 
7>{X = 3} = \\P{X = 2} = .2765 
P{X = 4} = ||P{X = 3} = .1382 
P{X = 5} = \\P{X = 4} = .0369 
7>{X = 6} = |ip{X = 5} = .004l. ■ 

The text disk uses Equation 5. 1 .4 to compute binomial probabilities. In using it, one enters 
the binomial parameters n and/) and a value i and the program computes the probabilities 
that a binomial (n, p) random variable is equal to and is less than or equal to i. 
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Binomial Distribution 
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Probability (Number of Successes = i) .04575381 
Probability (Number of Successes <= i) .1 49541 05 




■ 



FIGURE 5.2 



EXAMPLE 5. If IfXis a binomial random variable with parameters n = 100 and/> = .75, 
find P{X = 70} and P{X < 70}. 

SOLUTION The text disk gives the answers shown in Figure 5.2. ■ 



5.2 THE POISSON RANDOM VARIABLE 

A random variable X, taking on one of the values 0, 1, 2, . . . , is said to be a Poisson 
random variable with parameter X, X > 0, if its probability mass function is given by 



P{X 



X' 



0,1, 



(5.2.1) 



The symbol e stands for a constant approximately equal to 2.71 83. It is a famous constant in 
mathematics, named after the Swiss mathematician L. Euler, and it is also the base of the 
so-called natural logarithm. 

Equation 5.2.1 defines a probability mass function, since 

00 00 



A graph of this mass function when X = 4 is given in Figure 5.3. 

The Poisson probability distribution was introduced by S. D. Poisson in a book he wrote 
dealing with the application of probability theory to lawsuits, criminal trials, and the like. 
This book, published in 1837, was entitled Recherches sur la probability des jugements en 
matiere criminelle et en matiere civile. 
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FIGURE 5.3 The Poisson probability mass function with X = 4. 



As a prelude to determining the mean and variance of a Poisson random variable, let 
us first determine its moment generating function. 

m = E[e tX ] 



= J^ e ti e- x k'li\ 

00 

= e~ x J](A/)Vi! 



Differentiation yields 

4>'(t) 
<t>"(t) 



e e 



= exp{k(e* - 1)} 



ke t exp{k(e t - 1)} 



J\2 



(keT exp{k(e' - 1)} + ke' exp{k(e [ - 1)} 
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Evaluating at t = gives that 

E[X] = 0'(O) = A 
Var(X) = 0"(O) - (E[X]) 2 

^ A ~r A — A ^ A 

Thus both the mean and the variance of a Poisson random variable are equal to the 
parameter A. 

The Poisson random variable has a wide range of applications in a variety of areas because 
it may be used as an approximation for a binomial random variable with parameters (n, p) 
when n is large and/> is small. To see this, suppose that X\s a binomial random variable 
with parameters (n, p) and let A = np. Then 

[n — \)\i\ 

n\ AY/ A 



(n — l)\i\ \n/\ n , 

n(n- l)...(«-«'+l)A'(l -X/n)" 
n* i\ (1 -X/n)' 



Now, for n large and p small, 



X\" _ k n{n — 1) ...(» — »+ 1) / X 



nj n' \ n 

Hence, for n large and/> small, 

A 2 
i\ 

In other words, if n independent trials, each of which results in a "success" with probability 
p, are performed, then when n is large and p small, the number of successes occurring is 
approximately a Poisson random variable with mean A = np. 

Some examples of random variables that usually obey, to a good approximation, the 
Poisson probability law (that is, they usually obey Equation 5.2. 1 for some value of A) are: 

1. The number of misprints on a page (or a group of pages) of a book. 

2. The number of people in a community living to 100 years of age. 

3. The number of wrong telephone numbers that are dialed in a day. 

4. The number of transistors that fail on their first day of use. 

5. The number of customers entering a post office on a given day. 
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6. The number of a-particles discharged in a fixed period of time from some radioactive 
particle. 

Each of the foregoing, and numerous other random variables, is approximately Poisson 
for the same reason — namely, because of the Poisson approximation to the binomial. For 
instance, we can suppose that there is a small probability/* that each letter typed on a page 
will be misprinted, and so the number of misprints on a given page will be approximately 
Poisson with mean X = np where n is the (presumably) large number of letters on that 
page. Similarly, we can suppose that each person in a given community, independently, has 
a small probability/* of reaching the age 100, and so the number of people that do will have 
approximately a Poisson distribution with mean np where n is the large number of people 
in the community. We leave it for the reader to reason out why the remaining random 
variables in examples 3 through 6 should have approximately a Poisson distribution. 

EXAMPLE 5.2a Suppose that the average number of accidents occurring weekly on a par- 
ticular stretch of a highway equals 3. Calculate the probability that there is at least one 
accident this week. 

SOLUTION Let X denote the number of accidents occurring on the stretch of highway in 
question during this week. Because it is reasonable to suppose that there are a large number 
of cars passing along that stretch, each having a small probability of being involved in 
an accident, the number of such accidents should be approximately Poisson distributed. 
Hence, 



P{X > 1} = 1 - P{X = 0} 

0! 



= 1-," 3 - 



= 1 - e~ 3 



.9502 



EXAMPLE 5.2b Suppose the probability that an item produced by a certain machine will 
be defective is .1. Find the probability that a sample of 10 items will contain at most one 
defective item. Assume that the quality of successive items is independent. 

SOLUTION The desired probability is (o°)(.l) (.9) 10 + (^(.l) 1 (.9) 9 = .7361, whereas 
the Poisson approximation yields the value 

1° l 1 

e~ x — + e~ x — =2e~ l % .7358 ■ 
0! 1! 

EXAMPLE 5.2c Consider an experiment that consists of counting the number of a parti- 
cles given off in a one-second interval by one gram of radioactive material. If we know 
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from past experience that, on the average, 3.2 such a-particles are given off, what is a good 
approximation to the probability that no more than 2 a-particles will appear? 

SOLUTION If we think of the gram of radioactive material as consisting of a large number n 
of atoms each of which has probability 3.2/ 'n of disintegrating and sending off an a-particle 
during the second considered, then we see that, to a very close approximation, the number 
of a-particles given off will be a Poisson random variable with parameter k = 3.2. Hence 
the desired probability is 

P{X < 2} = ," 3 - 2 + 3.2," 3 - 2 + ^!,"3-2 

2 

= .382 ■ 

EXAMPLE 5.2d If the average number of claims handled daily by an insurance company is 
5, what proportion of days have less than 3 claims? What is the probability that there will 
be 4 claims in exactly 3 of the next 5 days? Assume that the number of claims on different 
days is independent. 

SOLUTION Because the company probably insures a large number of clients, each having a 
small probability of making a claim on any given day, it is reasonable to suppose that the 
number of claims handled daily, call \tX, is a Poisson random variable. Since E(X) = 5, 
the probability that there will be fewer than 3 claims on any given day is 

P{X < 3} = P{X = 0} + P{X = 1} + P{X = 2} 

5 55 1 5 5 2 

= ' +e V. +e 2! 

2 
«s .1247 

Since any given day will have fewer than 3 claims with probability .125, it follows, from 
the law of large numbers, that over the long run 12.5 percent of days will have fewer than 
3 claims. 

It follows from the assumed independence of the number of claims over successive days 
that the number of days in a 5-day span that has exactly 4 claims is a binomial random 
variable with parameters 5 and P{X = A}. Because 

5 4 
P{X = 4} = e~ 5 — ^.1755 

4! 



it follows that the probability that 3 of the next 5 days will have 4 claims is equal to 

)(.1755) 3 (.8245) 2 %.0367 ■ 
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The Poisson approximation result can be shown to be valid under even more general 
conditions than those so far mentioned. For instance, suppose that n independent trials 
are to be performed, with the z'th trial resulting in a success with probability pi, i = 1,. . . ,n. 
Then it can be shown that if n is large and each pi is small, then the number of successful 
trials is approximately Poisson distributed with mean equal to ^2" = \pi- In fact, this 
result will sometimes remain true even when the trials are not independent, provided that 
their dependence is "weak." For instance, consider the following example. 

EXAMPLE 5.2e At a party n people put their hats in the center of a room, where the 
hats are mixed together. Each person then randomly chooses a hat. If X denotes the 
number of people who select their own hat then, for large n, it can be shown that X has 
approximately a Poisson distribution with mean 1 . To see why this might be true, let 

{1 if the z'th person selects his or her own hat 
otherwise 



Then we can express X as 



X=X l + ---+X n 



and soXcan be regarded as representing the number of "successes" in n "trials" where trial 
i is said to be a success if the z'th person chooses his own hat. Now, since the z'th person is 
equally likely to end up with any of the n hats, one of which is his own, it follows that 

P{Xi = 1} = - (5.2.2) 

n 

Suppose now that z 7^ j and consider the conditional probability that the z'th person 
chooses his own hat given that they'th person does — that is, consider P{Xi = l\X } ■ = 1}. 
Now given that they'th person indeed selects his own hat, it follows that the z'th individual 
is equally likely to end up with any of the remaining n — 1 , one of which is his own. Hence, 
it follows that 

P{Xi = l\Xj = 1} = -*— (5.2.3) 

n — 1 

Thus, we see from Equations 5.2.2 and 5.2.3 that whereas the trials are not independent, 
their dependence is rather weak [since, if the above conditional probability were equal to 
l/« rather than l/(« — 1), then trials z andy would be independent]; and thus it is not 
at all surprising that Xhas approximately a Poisson distribution. The fact that E[X] = 1 
follows since 

E[X] = E[Xi + ■ ■ ■ + X„] 

= E[X i ] + ---+E[X n ] 

="(;) = ' 
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The last equality follows since, from Equation 5.2.2, 

1 



E[Xi\ = P{Xi = 1} = 



;/ 



The Poisson distribution possesses the reproductive property that the sum of indepen- 
dent Poisson random variables is also a Poisson random variable. To see this, suppose that 
X\ u\dX 2 are independent Poisson random variables having respective means k\ and A.2- 
Then the moment generating function of X\ + X2 is as follows: 

E[e t{Xl+X2) ] = E[e tXl e tXl \ 

= E\e* x \E\e* X 2 \ by independence 
= exp{A 1 (/-l)} exp{A 2 (/-l)} 
= exp{(A 1 +A 2 )(/-l)} 

Because exp{(A.i + k2)(e t — 1)} is the moment generating function of a Poisson random 
variable having mean k\ + k 2 , we may conclude, from the fact that the moment generating 
function uniquely specifies the distribution, that X\ + Xi is Poisson with mean ~k\ + k 2 . 

EXAMPLE 5.2f It has been established that the number of defective stereos produced daily 
at a certain plant is Poisson distributed with mean 4. Over a 2-day span, what is the 
probability that the number of defective stereos does not exceed 3? 

SOLUTION Assuming that X\ , the number of defectives produced during the first day, is 
independent oiX 2 , the number produced during the second day, thenXi +X2 is Poisson 
with mean 8. Hence, 

P{X X + X 2 < 3} = J] e~ 8 — = .04238 ■ 
i=o ll 

Consider now a situation in which a random number, call it N, of events will occur, and 
suppose that each of these events will independently be a type 1 event with probability/) or 
a type 2 event with probability 1 — p. Let TVi and N2 denote, respectively, the numbers of 
type 1 and type 2 events that occur. (So N = N\ + N2.) If N is Poisson distributed 
with mean A, then the joint probability mass function of TVi and N2 is obtained as 
follows. 

P{Ni = n,N 2 = m] = P{Ni = n,N 2 = m,N = n + m} 

= P{Ni = n,N 2 = m\N = n + m}P{N = n + m} 

\ n-\-m 

= P{Ni =n,N 2 = m\N = n + m\e~ l 



(n + m)\ 
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Now, given a total of n + m events, because each one of these events is independently 
type 1 with probability/*, it follows that the conditional probability that there are exactly 
n type 1 events (and m type 2 events) is the probability that a binomial (n + m,p) random 
variable is equal to n. Consequently, 



» + «)! »„ , m -x V +m 



P{Ni =n,N 2 = m}= -p n {\ -p) m e 



\m\ (n + m)\ 



The probability mass function of N\ is thus 

P{Ni = n } = J2 PW\ =n,N 2 = m) 

m=Q 

= e -x P OJT f.- t( i-» W-# 

n\ *— ' ml 

m=Q 



= e- x f ( ^- (5.2.5) 



Similarly, 

P{N 2 = m} = J^ P{Ni = n,N 2 = m} = e'^ 1 '^ — ^— (5.2.6) 

It now follows from Equations 5.2.4, 5.2.5, and 5.2.6, that N\ and N 2 are independent 
Poisson random variables with respective means Xp and A(l — p). 

The preceding result generalizes when each of the Poisson number of events can be 
classified into any of r categories, to yield the following important property of the Poisson 
distribution: If each of a Poisson number of events having mean X is independently classified as 
beingofone of the types 1, . . . , r, with respective probabilities p\, . . . ,p r , ~Y^i=\pi = 1> t ^ en 
the numbers of type 1, . . . ,r events are independent Poisson random variables with respective 
means Xp\, . . . ,Xp r . 

5.2. 1 Computing the Poisson Distribution Function 

If Xis Poisson with mean X, then 

P{X = i+\) = e- x X i+l l{i+\)\ X 

P{X = i] e~ x X'li\ i+\ 
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Starting with P{X = 0} = e , we can use Equation 5.2.7 to successively compute 

P{X = \} = kP{X = 0} 

P{X = 2} = ^P{X = 1} 



P{X = i+\} = ^—P{X = i] 

i + 1 

The text disk includes a program that uses Equation 5.2.7 to compute Poisson probabilities. 

5.3 THE HYPERGEOMETRIC RANDOM VARIABLE 

A bin contains N + M batteries, of which TV are of acceptable quality and the other M are 
defective. A sample of size n is to be randomly chosen (without replacements) in the sense 
that the set of sampled batteries is equally likely to be any of the ( ) subsets of size n. 
If we let X denote the number of acceptable batteries in the sample, then 

( N )( M ) 
P{X = i}= y j„l B -' , i = 0,l,...,min(7V,«r (5.3.1) 

Any random variable X whose probability mass function is given by Equation 5.3.1 is said 
to be a hypergeometric random variable with parameters N, M, n. 

EXAMPLE 5.3a The components of a 6-component system are to be randomly chosen from 
a bin of 20 used components. The resulting system will be functional if at least 4 of its 
6 components are in working condition. If 1 5 of the 20 components in the bin are in 
working condition, what is the probability that the resulting system will be functional? 

SOLUTION IfXis the number of working components chosen, then X is hypergeometric 
with parameters 15, 5, 6. The probability that the system will be functional is 



P{X>A} = Y,P{X = i} 



'txk /xd+cdc. 

20 
6 

.8687 ■ 



* We are following the convention that ( „ ) = if f > m or if r < 0. 
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To compute the mean and variance of a hypergeometric random variable whose prob- 
ability mass function is given by Equation 5.3.1, imagine that the batteries are drawn 
sequentially and let 

II if the 2th selection is acceptable 
otherwise 

Now, since the 2th selection is equally likely to be any of the N + M batteries, of which 
TV are acceptable, it follows that 

N 
P{Xi = 1} = (5.3.2) 

1 J N + M 



Also, for z 7^ j, 



P{Xi = \,Xj = 1} = P{Xi = \}P{Xj = l\Xi = 1] 

N N- 1 



N + M N + M - 1 



(5.3.3) 



which follows since, given that the ith selection is acceptable, the jth selection is equally 
likely to be any of the other N + M — \ batteries of which TV — 1 are acceptable. 

To compute the mean and variance of X, the number of acceptable batteries in the 
sample of size n, use the representation 

n 

x = Y j x, 

i=\ 

This gives 

FIX] = ±E[X i] = ±P{ Xi = l} = ^- (5.3.4) 

i=\ i=\ 

Also, Corollary 4.7.3 for the variance of a sum of random variables gives 

n 

Var(X) = J2 VarpQ) + 2 J] J] CovpQ, Xj) (5.3.5) 

i=\ l<)<j<n 

Now, Xi is a Bernoulli random variable and so 

Var(X}) = P{Xi = 1}(1 - P{X, = 1}) = (5.3.6) 

1 ' x ' N+MN+M 
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Also, for i < j, 

Cov(X„Xj) = E[X t Xj] - E[X t ]E[Xj] 

Now, because both Xi and Xj are Bernoulli (that is, — 1) random variables, it follows 
that XjXj is a Bernoulli random variable, and so 

E[X l X J ]=P{X i X j =\] 

= P{X i =l,X } = l} 
N{N - 1) 



(N + M)(N + M - 1) 
So from Equation 5.3.2 and the foregoing we see that for i ^ j, 

N{N - 1) ( N 



from Equation 5.3.3 (5.3.7) 



Cov&i,Xj) = 



{N + M){N + M-l) \N + M 
-NM 



(N + M) 2 {N + M - 1) 

Hence, since there are (2) terms in the second sum on the right side of Equation 5.3.5, 
we obtain from Equation 5.3.6 

nNM n{n- \)NM 

Var(X) 



{N + MY (N + M) 2 (N + M - 1) 

nNM ( n - 1 \ 

1 - . r . ,, ; (5-3. 



(N + MY \ N + M-l 

If we let^> = NI{N +M) denote the proportion of batteries in the bin that are acceptable, 
we can rewrite Equations 5.3.4 and 5.3.8 as follows. 

E{X) = np 

Var(X) = np{\-p) 



N + M - 1 



It should be noted that, for fixed p, as N + M increases to 00, Var(X) converges to 
np{\ — p), which is the variance of a binomial random variable with parameters (», p). 
(Why was this to be expected?) 

EXAMPLE 5.3b An unknown number, say N, of animals inhabit a certain region. 
To obtain some information about the population size, ecologists often perform the 
following experiment: They first catch a number, say r, of these animals, mark them 
in some manner, and release them. After allowing the marked animals time to disperse 
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throughout the region, a new catch of size, say, n is made. Let X denote the number of 
marked animals in this second capture. If we assume that the population of animals in the 
region remained fixed between the time of the two catches and that each time an animal 
was caught it was equally likely to be any of the remaining uncaught animals, it follows 
that X is a hypergeometric random variable such that 



0(17) 

P{X = i] = v ; = PAN) 

\n) 



Suppose now that X is observed to equal i. That is, the fraction iln of the animals in 
the second catch were marked. By taking this as an approximation of rIN, the proportion 
of animals in the region that are marked, we obtain the estimate rnli of the number of 
animals in the region. For instance, if r = 50 animals are initially caught, marked, and 
then released, and a subsequent catch of n = 100 animals revealed X = 25 of them 
that were marked, then we would estimate the number of animals in the region to be 
about 200. ■ 

There is a relationship between binomial random variables and the hypergeo- 
metric distribution that will be useful to us in developing a statistical test concerning 
two binomial populations. 

EXAMPLE 5.3c Let X and Fbe independent binomial random variables having respective 
parameters (n, p) and (m, p). The conditional probability mass function of X given that 
X + Y = k is as follows. 



P{X = i\X+Y = k} = 



P{X = i,X + Y = k] 



P{X + Y = 
P{X = i,Y = i 


k} 

k-i) 


)/" 


•'(l 


-P) m - 




P{X + Y = 
P{X = i}P{Y -- 


k) 

= k-i] 




P{X + Y = 

(:)/o-,r 


--k) 

-i( m ' 
\k - i, 


-(k-i) 


V k 

0G-.) 


™)/a 


-PY 


i-\-m- 


-k 





(n + m\ 
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where the next-to-last equality used the fact that X + Y is binomial with parameters 
(n + m,p). Hence, we see that the conditional distribution of Xgiven the value of X + Y 
is hypergeometric. 

It is worth noting that the preceding is quite intuitive. For suppose that n + m inde- 
pendent trials, each of which has the same probability of being a success, are performed; 
let X be the number of successes in the first n trials, and let Y be the number of successes 
in the final m trials. Given a total of k successes in the n + m trials, it is quite intuitive 
that each subgroup of k trials is equally likely to consist of those trials that resulted in 
successes. That is, the k success trials are distributed as a random selection of k of the 
n + m trials, and so the number that are from the first n trials is hypergeometric. ■ 

5.4 THE UNIFORM RANDOM VARIABLE 

A random variable X is said to be uniformly distributed over the interval [a, /3] if its 
probability density function is given by 



/(*) = 



1 
it a < x < 

fj-a 

otherwise 



A graph of this function is given in Figure 5.4. Note that the foregoing meets the 
requirements of being a probability density function since 



P 



f 

J a 



dx 



The uniform distribution arises in practice when we suppose a certain random variable is 
equally likely to be near any value in the interval [a, /?]. 

The probability that Xlies in any subinterval of [a, /3] is equal to the length of that 
subinterval divided by the length of the interval [a, ft]. This follows since when [a, b~\ 



f(x) 



1 



FIGURE 5.4 Graph of f(x) for a uniform [a,/?]. 
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f(x) 




FIGURE 5.5 Probabilities of a uniform random variable. 



is a subinterval of [a, /3] (see Figure 5.5), 



i r l 

P{a<X<b}=- / 



dx 



p 



EXAMPLE 5.4a If X is uniformly distributed over the interval [0, 10], compute the 
probability that (a) 2 < X < 9, (b) 1 < X < 4, (c) X < 5, (d) X > 6. 

SOLUTION The respective answers are (a) 7/10, (b) 3/10, (c) 5/10, (d) 4/10. ■ 

EXAMPLE 5.4b Buses arrive at a specified stop at 1 5-minute intervals starting at 7 A.M. That 
is, they arrive at 7, 7:15, 7:30, 7:45, and so on. If a passenger arrives at the stop at a time 
that is uniformly distributed between 7 and 7:30, find the probability that he waits 

(a) less than 5 minutes for a bus; 

(b) at least 12 minutes for a bus. 

SOLUTION Let X denote the time in minutes past 7 A.M. that the passenger arrives at the 
stop. Since Xis a uniform random variable over the interval (0, 30), it follows that the 
passenger will have to wait less than 5 minutes if he arrives between 7:10 and 7:15 or 
between 7:25 and 7:30. Hence, the desired probability for (a) is 



P{10 <X < 15}+P{25 <X<30}=4 + 4 = 



30 ' 30 



Similarly, he would have to wait at least 12 minutes if he arrives between 7 and 7:03 or 
between 7:15 and 7:18, and so the probability for (b) is 



P{0 < X < 3} + i>{15 < X < 18} = 4 + 4 = 



30 ' 30 
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The mean of a uniform [a, /3] random variable is 

E[X\= / dx 

J a P -a 

p 2 -a 2 
~ 2(/3 - a) 
^ (P-a)(P+a) 
2(j8 - a) 



a+ P 

Or, in other words, the expected value of a uniform [a, 6] random variable is equal to the 
midpoint of the interval [a, 6], which is clearly what one would expect. (Why?) 
The variance is computed as follows. 

1 r& 

E[X 2 ] 



1 


_ / ^ , 


P - a Jot 

B 3 -a 3 


308- 

P 2 + 


-a) 

afi+a 2 



and ; 



Var(X) = 



P 2 + afi + a 2 


/^ + £\ 2 


3 


I 2 ) 


a 2 + ft 2 - lap 




12 




08 " «) 2 





12 

EXAMPLE 5.4c The current in a semiconductor diode is often measured by the Shockley 
equation 

/ = I (e aV - 1) 

where Kis the voltage across the diode; To is the reverse current; a is a constant; and /is 
the resulting diode current. Find E[I] if a = 5, /o = 10 , and Kis uniformly distributed 
over (1, 3). 
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SOLUTION 



E[I] = E[I {e 



aV 



= hE\e 



aV 



-1)] 

1] 



= I (E[e aV ]-l) 



= 10" 



■\>\ 



io-V 5 

.3269 ■ 



2 
e 5 ) 



dx - 10" 



10" 



The value of a uniform (0, 1) random variable is called a random number. Most com- 
puter systems have a built-in subroutine for generating (to a high level of approximation) 
sequences of independent random numbers — for instance, Table 5.1 presents a set of 
independent random numbers generated by an IBM personal computer. Random numbers 
are quite useful in probability and statistics because their use enables one to empirically 
estimate various probabilities and expectations. 

TABLE 5. 1 A Random Number Table 



.68587 


.25848 


.85227 


.78724 


.05302 


.70712 


.76552 


.70326 


.80402 


.49479 


.73253 


.41629 


.37913 


.00236 


.60196 


.59048 


.59946 


.75657 


.61849 


.90181 


.84448 


.42477 


.94829 


.86678 


.14030 


.04072 


.45580 


.36833 


.10783 


.33199 


.49564 


.98590 


.92880 


.69970 


.83898 


.21077 


.71374 


.85967 


.20857 


.51433 


.68304 


.46922 


.14218 


.63014 


.50116 


.33569 


.97793 


.84637 


.27681 


.04354 


.76992 


.70179 


.75568 


.21792 


.50646 


.07744 


.38064 


.06107 


.41481 


.93919 


.37604 


.27772 


.75615 


.51157 


.73821 


.29928 


.62603 


.06259 


.21552 


.72977 


.43898 


.06592 


.44474 


.07517 


.44831 


.01337 


.04538 


.15198 


.50345 


.65288 


.86039 


.28645 


.44931 


.59203 


.98254 


.56697 


.55897 


.25109 


.47585 


.59524 


.28877 


.84966 


.97319 


.66633 


.71350 


.28403 


.28265 


.61379 


.13886 


.78325 


.44973 


.12332 


.16649 


.88908 


.31019 


.33358 


.68401 


.10177 


.92873 


.13065 


.42529 


.37593 


.90208 


.50331 


.37531 


.72208 


.42884 


.07435 


.58647 


.84972 


.82004 


.74696 


.10136 


.35971 


.72014 


.08345 


.49366 


.68501 


.14135 


.15718 


.67090 


.08493 


.47151 


.06464 


.14425 


.28381 


.40455 


.87302 


.07135 


.04507 


.62825 


.83809 


.37425 


.17693 


.69327 


.04144 


.00924 


.68246 


.48573 


.24647 


.10720 


.89919 


.90448 


.80838 


.70997 


.98438 


.51651 


.71379 


.10830 


.69984 


.69854 


.89270 


.54348 


.22658 


.94233 


.08889 


.52655 


.83351 


.73627 


.39018 


.71460 


.25022 


.06988 


.64146 


.69407 


.39125 


.10090 


.08415 


.07094 


.14244 


.69040 


.33461 


.79399 


.22664 


.68810 


.56303 


.65947 


.88951 


.40180 


.87943 


.13452 


.36642 


.98785 


.62929 


.88509 


.64690 


.38981 


.99092 


.91137 


.02411 


.94232 


.91117 


.98610 


.71605 


.89560 


.92921 


.51481 


.20016 


.56769 


.60462 


.99269 


.98876 


.47254 


.93637 


.83954 


.60990 


.10353 


.13206 


.33480 


.29440 


.75323 


.86974 


.91355 


.12780 


.01906 


.96412 


.61320 


.47629 


.33890 


.22099 


.75003 


.98538 


.63622 


.94890 


.96744 


.73870 


.72527 


.17745 


.01151 


.47200 
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For an illustration of the use of random numbers, suppose that a medical center is 
planning to test a new drug designed to reduce its users' blood cholesterol levels. To test 
its effectiveness, the medical center has recruited 1,000 volunteers to be subjects in the 
test. To take into account the possibility that the subjects' blood cholesterol levels may be 
affected by factors external to the test (such as changing weather conditions), it has been 
decided to split the volunteers into 2 groups of size 500 — a treatment group that will be 
given the drug and a control group that will be given a placebo. Both the volunteers and 
the administrators of the drug will not be told who is in each group (such a test is called 
a double-blind test). It remains to determine which of the volunteers should be chosen 
to constitute the treatment group. Clearly, one would want the treatment group and the 
control group to be as similar as possible in all respects with the exception that members 
in the first group are to receive the drug while those in the other group receive a placebo; 
then it will be possible to conclude that any difference in response between the groups is 
indeed due to the drug. There is general agreement that the best way to accomplish this is 
to choose the 500 volunteers to be in the treatment group in a completely random fashion. 
That is, the choice should be made so that each of the ( 500 ) subsets of 500 volunteers is 
equally likely to constitute the control group. How can this be accomplished? 

*EXAMPLE 5.4d Choosing a Random Subset From a set of n elements — numbered 
1,2, . . . , n — suppose we want to generate a random subset of size k that is to be chosen 
in such a manner that each of the (^) subsets is equally likely to be the subset chosen. 
How can we do this? 

To answer this question, let us work backwards and suppose that we have indeed 
randomly generated such a subset of size k. Now for eachy = 1, ...,», we set 

II if element j is in the subset 
otherwise 

and compute the conditional distribution of Ij given I\,. . . ,Ij-\. To start, note that the 
probability that element 1 is in the subset of size k is clearly kin (which can be seen either by 
noting that there is probability \ln that element 1 would have been the^'th element chosen, 
j = 1, . . . , k; or by noting that the proportion of outcomes of the random selection that 

results in element 1 being chosen is( 1 )(^_ 1 )/(^)= kin). Therefore, we have that 

P{I l = l} = k/n (5.4.1) 

To compute the conditional probability that element 2 is in the subset given I\, note 
that if I\ = 1, then aside from element 1 the remaining k — 1 members of the subset 
would have been chosen "at random" from the remaining n — 1 elements (in the sense that 



* Optional. 
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each of the subsets of size k — 1 of the numbers 2, . . . , n is equally likely to be the other 
elements of the subset). Hence, we have that 



P{/ 2 =l|/i = l} = 



1 



(5.4.2) 



Similarly, if element 1 is not in the subgroup, then the k members of the subgroup would 
have been chosen "at random" from the other n — 1 elements, and thus 



P{/ 2 =l|/i=0} = 



(5.4.3) 



From Equations 5.4.2 and 5.4.3, we see that 



P{/ 2 =l|/i} = 



k-h 



In general, we have that 



p{ij = m 



■ jj-i} = 



7-1 

k-Y.h 

n — j + 1 



j = 2, . . . , n 



(5.4.4 



7—1 

The preceding formula follows since /,,•_ i h represents the number of the first j — 1 

j— l 
elements that are included in the subset, and so given I\ , . . . , 7,_ \ there remain k — Yji=\ h 

elements to be selected from the remaining n — [j — 1). 

Since P{U < a} = a, < a < 1, when U is a uniform (0, 1) random variable, 

Equations 5.4.1 and 5.4.4 lead to the following method for generating a random subset 

of size k from a set of n elements: Namely, generate a sequence of (at most n) random 

numbers U\, Ui, ■ ■ ■ and set 



h = 



h = 



1 ifUi<- 

n 

otherwise 



1 if tf 2 < 



k-h 



otherwise 



h = 



1 if Uj < 

otherwise 



k-h- 



n-j+l 
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U, >.4 



U 2 >.5 




S = {1,3} 



S={4,5} 



S={1,4} S = {1,5} S={2, 4} S={2, 5} S={3, 4} S={3, 5} 



FIGURE 5.6 Tree diagram. 



This process stops when I\ + - ■ -+Ij = k and the random subset consists of the k elements 
whose /-value equals 1. That is, S = {i : // = 1} is the subset. 

For instance, if k = 2, » = 5, then the tree diagram of Figure 5.6 illustrates the 
foregoing technique. The random subset S is given by the final position on the tree. Note 
that the probability of ending up in any given final position is equal to 1/10, which can be 
seen by multiplying the probabilities of moving through the tree to the desired endpoint. 
For instance, the probability of ending at the point labeled S = {2,4} is P{U\ > 
A}P{U 2 < .5}P{U 3 > \}P{U A >\} = (.6)(.5) (|) (i) = .1. 

As indicated in the tree diagram (see the rightmost branches that result in S = {4, 5}), 
we can stop generating random numbers when the number of remaining places in the 
subset to be chosen is equal to the remaining number of elements. That is, the general 

procedure would stop whenever either Yji=i ft = k ot Yji=\ ft = k — (n — j). In the 
latter case, S = {i < j : /,■ = \,j + 1, . . . , «}. ■ 

EXAMPLE 5.4e The random vector X, Y is said to have a uniform distribution over the 
two-dimensional region R if its joint density function is constant for points in R, and is 
for points outside of R. That is, if 



f(x,y) = 



if (x,y) e R 
if otherwise 
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= / f{x,y)dx d 
JR 

= I cdx dy 
JR 

= c x Area of R 



it follows that 



For any region A C R, 



1 






Area of R 



P{(X, Y)eA}= J I f(x,y) dx dy 

= I I c dx dy 

J J{x,y)eA 



_ Area of A 
Area of R 

Suppose now thatX, Kis uniformly distributed over the following rectangular region R: 
0,b a,b 




0,0 



a, 



Its joint density function is 



f(x,y) = 



c if0<x<<2, <y < b 
otherwise 



where c = -* j 1- = —r. In this case, Xand Fare independent uniform random 

Area or rectangle ab r 

variables. To show this, note that for < x < a, < y < b 



P{X < X ,Y <y} = c 



\ \ dy 

Jo Jo 



xy 



(5.4.5) 
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0.399 




// + 3a 



FIGURE 5.7 The normal density function (a) with ji = 0, a = 1 aW f&J to&fr arbitrary fi andff^ 



First letting j = £, and then letting x = a, in the preceding shows that 



/>{X <*} = -, P{F<j/} = 



y 



(5.4.6) 



Thus, from Equations 5.4.5 and 5.4.6 we can conclude that Xand Fare independent, 
with X being uniform on (0, a) and Y being uniform on (0, b). ■ 

5.5 NORMAL RANDOM VARIABLES 

A random variable is said to be normally distributed with parameters /x and a , and we 
write X ~ A/"(/X, a 2 ), if its density is 



/(*) = 



'lira 



-(x-ix) 2 /2a 2 



-oo < x < oo 



The normal density /"(x) is a bell-shaped curve that is symmetric about /x and that 
attains its maximum value of \lo \fljz fa 0. 399/cr at x = fi (see Figure 5.7). 

The normal distribution was introduced by the French mathematician Abraham de 
Moivre in 1 733 and was used by him to approximate probabilities associated with binomial 
random variables when the binomial parameter n is large. This result was later extended by 
Laplace and others and is now encompassed in a probability theorem known as the central 
limit theorem, which gives a theoretical base to the often noted empirical observation that, 
in practice, many random phenomena obey, at least approximately, a normal probability 
distribution. Some examples of this behavior are the height of a person, the velocity in any 
direction of a molecule in gas, and the error made in measuring a physical quantity. 



* To verify that this is indeed a density function, see Problem 29. 
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The moment generating function of a normal random variable with parameters jjl and 
o is derived as follows: 

<f>{t) = E[e tX ] 

1 r°° -, i 

= — L- / e * e -(x-ri 2 '2° 2 dx 

■Jliio J-oo 
= -^e^ [°° e^e-y 1 ' 2 dy by letting y = *-^ 



Alt fOO 



'lit J-OO 
Alt j-OO 



= exp < fit + 

= exp \ fit + 



f°° I \y 2ta y 
\ ex p| 

J —oo I 

o 2 t 2 \ i r 



dy 






2 2 

o L t 



(5.5.1) 



where the last equality follows since 



-(y-ta) 2 !2 



'2jt 



is the density of a normal random variable (having parameters to and 1) and its integral 
must thus equal 1 . 

Upon differentiating Equation 5.5.1, we obtain 



4>'{t) = (fi + to ) exp \ fit + o 



Hence, 



#- w =.» ap j,« + 4} +a p{M. + 4i(, + »« 



E[X] = 0'(O) = n 
E[X 2 ] = 0"(O) = o 2 + fi 2 

E[X] = \x 
Var(X) = E[X 2 ] - {E[X]) 2 = o 2 



Thus fx and o represent respectively the mean and variance of the distribution. 
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An important fact about normal random variables is that if Xis normal with mean /x 
and variance a , then Y = aX + fi is normal with mean a[i + /3 and variance a a . 
That this is so can easily be seen by using moment generating functions as follows. 

E[e t(aX+ ®] = e^E[e atX ] 

= e*? exp{fiat + a (at) 12} from Equation 5.5.1 
= exp{(/3 + fia)t + a 2 cr 2 t 2 /2} 

Because the final equation is the moment generating function of the normal random 
variable with mean /3 + /xa and variance a a , the result follows. 
It follows from the foregoing that if X ~ M(ix, a 2 ), then 

a 

is a normal random variable with mean and variance 1 . Such a random variable Z is 
said to have a standard, or unit, normal distribution. Let $(•) denote its distribution 
function. That is, 

1 f x 2 

O(x) = / e~ y dy, —oo < x < oo 

V27T J -oo 

This result that Z = (X — fj.)/cr has a standard normal distribution when X is normal 
with parameters \x and a is quite important, for it enables us to write all probability 
statements about Xin terms of probabilities for Z. For instance, to obtain P{X < b}, we 
note that Xwill be less than b if and only if (X — fi)/a is less than (b — /x)/cr, and so 

\X — il b — u 

P{X <b}=P\ < 

[ G cr 

, ,' b - I 1 
4> 



a 



Similarly, for any a < b, 



P{a <X < b) = P \ < 



a a a 

a— ii „ b — a 

= P\ <Z < - 

a a 

= p\z< b —»\-p\z< a -> 1 



' b — il\ (a 



a 



?) 
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{Z>x} 



FIGURE 5.8 Standard normal probabilities. 



It remains for us to compute <J>(x). This has been accomplished by an approximation 
and the results are presented in Table A 1 of the Appendix, which tabulates <t> (x) (to a 4-digit 
level of accuracy) for a wide range of nonnegative values of x. In addition, Program 5.5a of 
the text disk can be used to obtain <l>(x). 

While Table Al tabulates <t>(x) only for nonnegative values of x, we can also obtain 
<t>(— x) from the table by making use of the symmetry (about 0) of the standard normal 
probability density function. That is, for x > 0, if Z represents a standard normal random 
variable, then (see Figure 5.8) 

*(-*) = P{Z < -x} 

= P{Z > x} by symmetry 
= 1 - $(*) 
Thus, for instance, 

P{Z < -1} = <D(-1) = 1 - 4>(1) = 1 - .8413 = .1587 

EXAMPLE 5.5a If X is a normal random variable with mean \x = 3 and variance 
a 2 = 16, find 

(a) P{X < 11}; 

(b) P{X>-\}; 

(c) P{2 < X < 7}. 



SOLUTION 

(a) 



P { X < 11} = P{^< n ~ ' 



= *(2) 

= .9772 



(b) 



,X-3 -1-3 
P{X > -1}=P{ — — > 



= P{Z>-1} 
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= P{Z <1} 
= .8413 
(c) P{2 <X < 7 }=p\ 2 -^ < ^LZI < 7 " ' 



4 

= $(1)- $(-1/4) 
= $(1)-(1 -$(1/4)) 
= . 8413 + . 5987 - 1 = .4400 ■ 

EXAMPLE 5.5b Suppose that a binary message — either "0" or "1" — must be transmitted 
by wire from location A to location B. However, the data sent over the wire are subject 
to a channel noise disturbance and so to reduce the possibility of error, the value 2 is sent 
over the wire when the message is "1" and the value —2 is sent when the message is "0." If 
x, x = ±2, is the value sent at location A then R, the value received at location B, is given 
by R = x + N, where N is the channel noise disturbance. When the message is received 
at location B, the receiver decodes it according to the following rule: 

if R > .5, then "1" is concluded 
if R < .5, then "0" is concluded 

Because the channel noise is often normally distributed, we will determine the error 
probabilities when A/ is a standard normal random variable. 

There are two types of errors that can occur: One is that the message " 1 " can be 
incorrectly concluded to be "0" and the other that "0" is incorrectly concluded to be "1." 
The first type of error will occur if the message is " 1 " and 2+N < .5, whereas the second 
will occur if the message is "0" and —2 + A/ > .5. 

Hence, 

f{error| message is "1"} = P{N < —1.5} 

= 1 - d>(1.5) = .0668 

and 

P{ error | message is "0"} = P{N > 2.5} 

= 1 - $(2.5) = .0062 ■ 

EXAMPLE 5.5c The power W dissipated in a resistor is proportional to the square of the 
voltage V. That is, 

W = rV 2 
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where r is a constant. If r = 3, and Kcan be assumed (to a very good approximation) to 
be a normal random variable with mean 6 and standard deviation 1, find 

(a) E[W\; 

(b) P{W > 120}. 

SOLUTION 

(a) E[W]=E[3V 2 ] 

= 3E[V 2 ] 

= 3(Var[V]+E 2 [V]) 

= 3(1+36) = 111 

P{W > 120}=P{3V 2 > 120} 
= P{V > ^40} 

= 7 5 {K-6> V40- 6} 
= P{Z > .3246} 
= 1 - 0(.3246) 

= .3727 ■ 



(b) 



Another important result is that the sum of independent normal random variables is 
also a normal random variable. To see this, suppose that X;, i = 1, ... ,n, are independent, 
with Xj being normal with mean \Xi and variance a 2 . The moment generating function 
of ^2"— i Xi is as follows. 



exp 






= E[e tXl e tX >- 


■ e tX »] 




= f\E[e«-] 


by independence 


n 

= Y\ e^ t+a i l 

i=\ 


2 /2 




_ jj.t+a 2 t 2 l2 







where 

n n 

v- = X! /x " ct2 = X ct ' 2 

Therefore, Yl"= l ^' nas tne same moment generating function as a normal random variable 
having mean /x and variance a . Hence, from the one-to-one correspondence between 
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moment generating functions and distributions, we can conclude that 5Z?=l % is normal 
with mean Yl"=\ I 1 ' an d variance Yl*i=\ a f- 

EXAMPLE 5.5d Data from the National Oceanic and Atmospheric Administration indicate 
that the yearly precipitation in Los Angeles is a normal random variable with a mean of 
12.08 inches and a standard deviation of 3.1 inches. 

(a) Find the probability that the total precipitation during the next 2 years will exceed 
25 inches. 

(b) Find the probability that next year's precipitation will exceed that of the 
following year by more than 3 inches. 

Assume that the precipitation totals for the next 2 years are independent. 

SOLUTION Let X\ and X 2 be the precipitation totals for the next 2 years. 

(a) SinceXi +Xz is normal with mean 24. 16 and variance 2(3.1) = 19.22, it follows 
that 

IXi+Xz- 24.16 25-24.16 
P{X 1 +X 2 >25}=P i 



■JWTi JWn 

= P{Z > .1916} 
«s .4240 

(b) Since — X 2 is a normal random variable with mean —12.08 and variance 
( — 1) (3.1) j it follows that X\ — X2 is normal with mean and variance 19.22. 
Hence, 

P{X r >X 2 + 3}= P{X 1 -X 2 >3} 



VT9722 719722 
= P{Z > .6843} 
% .2469 

Thus there is a 42.4 percent chance that the total precipitation in Los Angeles 
during the next 2 years will exceed 25 inches, and there is a 24.69 percent chance 
that next year's precipitation will exceed that of the following year by more than 
3 inches. ■ 

For a 6 (0, 1), let z a be such that 

P{Z > z a } = 1 - 4>(za) = a 

That is, the probability that a standard normal random variable is greater than z a is equal 
to a (see Figure 5.9.) 
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FIGURE 5.9 P{Z>z a } = a. 



The value of z a can, for any a, be obtained from Table Al. For instance, since 

1 - $(1,645) = .05 
1 - $(1.96) = .025 
1 - <f>(2.33) = .01 

it follows that 

.s.05 = 1.645, Z.025 = 1-96, -s.oi = 2.33 

Program 5.5b on the text disk can also be used to obtain the value of z a . 
Since 

P{Z <z a ] = 1 -a 

it follows that 100(1 — a) percent of the time a standard normal random variable will 
be less than z a . As a result, we call z a the 100(1 — a) percentile of the standard normal 
distribution. 

5.6 EXPONENTIAL RANDOM VARIABLES 

A continuous random variable whose probability density function is given, for some 
A > 0, by 

., , he~ Xx if* > 
fix) — i 
J [0 ifx < 

is said to be an exponential random variable (or, more simply, is said to be exponen- 
tially distributed) with parameter X. The cumulative distribution function F(x) of an 
exponential random variable is given by 

F{x) = P{X < x} 

\e~ x y dy 



I 



o 
= 1 - e~ Xx , x > 
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The exponential distribution often arises, in practice, as being the distribution of the 
amount of time until some specific event occurs. For instance, the amount of time (starting 
from now) until an earthquake occurs, or until a new war breaks out, or until a telephone 
call you receive turns out to be a wrong number are all random variables that tend in 
practice to have exponential distributions (see Section 5.6.1 for an explanation). 

The moment generating function of the exponential is given by 

0M = E[e tX ] 



-L 



oo 

e^Xe'^dx 



poo 

X / e- {x - t)x dx 
Jo 

X 

t < X 



X — t 



Differentiation yields 



'X 
b'(t) = 



0"W = 



(x - tY 

2X 
(X - t)^ 



and ; 



E[X] = <//(0) = l/X 
Var(X) = </>"(()) - (E[X]) 2 
= 21 X 2 - l/X 2 
= l/X 2 

Thus X is the reciprocal of the mean, and the variance is equal to the square of the mean. 
The key property of an exponential random variable is that it is memoryless, where we 
say that a nonnegative random variable X is memoryless if 

P{X >s + t\X >t}=P{X >s) foralU?>0 (5.6.1) 

To understand why Equation 5.6.1 is called the memoryless property, imagine that X 
represents the length of time that a certain item functions before failing. Now let us 
consider the probability that an item that is still functioning at age t will continue to 
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function for at least an additional time s. Since this will be the case if the total functional 
lifetime of the item exceeds t + s given that the item is still functioning at t, we see that 

/"{additional functional life oft-unit-old item exceeds s] 
= P{X > t + s\X > t] 

Thus, we see that Equation 5.6.1 states that the distribution of additional functional 
life of an item of age t is the same as that of a new item — in other words, when 
Equation 5.6.1 is satisfied, there is no need to remember the age of a functional item 
since as long as it is still functional it is "as good as new." 
The condition in Equation 5.6.1 is equivalent to 

P{X > s + t,X > t) 

- 1 - L = P{X > s] 

P{X >t} X ' 

or 

P{X > s + t}= P{X > s}P{X > t) (5.6.2) 

When X is an exponential random variable, then 

P{X >x} = e~ Xx , x > 

and so Equation 5.6.2 is satisfied (since e ~ x{ - s+t > = e e ). Hence, exponentially 
distributed random variables are memoryless (and in fact it can be shown that they are 
the only random variables that are memoryless). 

EXAMPLE 5.6a Suppose that a number of miles that a car can run before its battery wears 
out is exponentially distributed with an average value of 10,000 miles. If a person desires 
to take a 5,000-mile trip, what is the probability that she will be able to complete her trip 
without having to replace her car battery? What can be said when the distribution is not 
exponential? 

SOLUTION It follows, by the memoryless property of the exponential distribution, that the 
remaining lifetime (in thousands of miles) of the battery is exponential with parameter 
A = 1/10. Hence the desired probability is 

/"{remaining lifetime > 5} = 1 — F(5) 

-5X 



= e 
= e~ m 



.604 
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However, if the lifetime distribution F is not exponential, then the relevant probability is 

l-F{t + 5) 



/"{lifetime > t + 5 1 lifetime > t] 



1 - F{t) 



where t is the number of miles that the battery had been in use prior to the start of the 
trip. Therefore, if the distribution is not exponential, additional information is needed 
(namely, t) before the desired probability can be calculated. ■ 

For another illustration of the memoryless property, consider the following example. 

EXAMPLE 5.6b A crew of workers has 3 interchangeable machines, of which 2 must be 
working for the crew to do its job. When in use, each machine will function for an expo- 
nentially distributed time having parameter X before breaking down. The workers decide 
to initially use machines A and B and keep machine C in reserve to replace whichever of 
A or B breaks down first. They will then be able to continue working until one of the 
remaining machines breaks down. When the crew is forced to stop working because only 
one of the machines has not yet broken down, what is the probability that the still operable 
machine is machine C? 

SOLUTION This can be easily answered, without any need for computations, by invoking 
the memoryless property of the exponential distribution. The argument is as follows: 
Consider the moment at which machine C is first put in use. At that time either A or 
B would have just broken down and the other one — call it machine — will still be 
functioning. Now even though would have already been functioning for some time, 
by the memoryless property of the exponential distribution, it follows that its remaining 
lifetime has the same distribution as that of a machine that is just being put into use. Thus, 
the remaining lifetimes of machine and machine C have the same distribution and so, 
by symmetry, the probability that will fail before C is 4. ■ 

The following proposition presents another property of the exponential distribution. 

PROPOSITION 5.6.1 If X\,X 2 , . . . , X n are independent exponential random variables hav- 
ing respective parameters k\, X 2 , . . . , X n , then min (X\,X 2 , . . . ,X„) is exponential with 
parameter Y^a—i ^i- 

Proof 

Since the smallest value of a set of numbers is greater than x if and only if all values are 
greater than x, we have 

P{mm{Xi,X 2 , . . . ,X„) > x] = P{Xi > x,X 2 > x, . . . ,X n > x] 

n 

= 11 P{Xi > x] by independence 

i=\ 
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=n 



« 



e 

l 



EXAMPLE 5.6c A series system is one that needs all of its components to function in 
order for the system itself to be functional. For an ^-component series system in which 
the component lifetimes are independent exponential random variables with respective 
parameters k\, A.2, • • • , X n , what is the probability that the system survives for a time t1 

SOLUTION Since the system life is equal to the minimal component life, it follows from 
Proposition 5.6.1 that 

fjsystem life exceeds t} = e~^' '* ■ 

Another useful property of exponential random variables is that cX is exponential with 
parameter XI c when X is exponential with parameter X, and c > 0. This follows since 

P{cX <x}= P{X < x/c} 
= 1 - e~ Xxlc 

The parameter X is called the rate of the exponential distribution. 

*5.6. 1 The Poisson Process 

Suppose that "events" are occurring at random time points, and let N(t) denote the number 
of events that occurs in the time interval [0, i\. These events are said to constitute a Poisson 
process having rate X, X > 0, if 

(a) N{0) = 

(b) The numbers of events that occur in disjoint time intervals are independent. 

(c) The distribution of the number of events that occur in a given interval depends 
only on the length of the interval and not on its location. 

,„ .. P{N(b) = 1} 

(d) lim = A 

h^o h 

(e) , im /W)>2) =0 

h^o h 

Thus, Condition (a) states that the process begins at time 0. Condition (b), the inde- 
pendent increment assumption, states for instance that the number of events by time t 
[that is, N(t)] is independent of the number of events that occurs between t and t + s 
[that is, N(t + s) — N(t)]. Condition (c), the stationary increment assumption, states that 
probability distribution o1N{t + s) — N(t) is the same for all values of t. Conditions (d) 
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and (e) state that in a small interval of length h, the probability of one event occurring is 
approximately Xb, whereas the probability of 2 or more is approximately 0. 

We will now show that these assumptions imply that the number of events occurring 
in any interval of length t is a Poisson random variable with parameter Xt. To be precise, 
let us call the interval [0, t] and denote by N(t) the number of events occurring in that 
interval. To obtain an expression for P{N{t) = k), we start by breaking the interval [0, t] 
into n nonoverlapping subintervals each of length tin (Figure 5.10). Now there will be k 
events in [0, t] if either 

(i) N{t) equals k and there is at most one event in each subinterval; 

(ii) N(t) equals k and at least one of the subintervals contains 2 or more events. 

Since these two possibilities are clearly mutually exclusive, and since Condition (i) is 
equivalent to the statement that k of the n subintervals contain exactly 1 event and the 
other n — k contain events, we have that 



P{N{t) = k} = P{k of the n subintervals contain exactly 1 event 

and the other n — k contain events} + P{N{t) = 
and at least 1 subinterval contains 2 or more events} 



(5.6.3) 



Now it can be shown, using Condition (e), that 

P{N(t) = k and at least 1 subinterval contains 2 or more events} 

— > as n — >- oo 



(5.6.4) 



Also, it follows from Conditions (d) and (e) that 



fjexactly 1 event in a subinterval} 



Xt 



P{0 events in a subinterval} & 1 



Xt 



Hence, since the numbers of events that occur in different subintervals are independent 
[from Condition (b)], it follows that 

P{k of the subintervals contain exactly 1 event and the other n — k contain events} 
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with the approximation becoming exact as the number of subintervals, n, goes to oo. 
However, the probability in Equation 5.6.5 is just the probability that a binomial random 
variable with parameters n and/> = Xtln equals k. Hence, as n becomes larger and larger, 
this approaches the probability that a Poisson random variable with mean nktln = Xt 
equals k. Hence, from Equations 5.6.3, 5.6.4, and 5.6.5, we see upon letting n approach 
oo that 

P{N{t) = k} = e~ 



-uwr 



We have shown: 

PROPOSITION 5.6.2 For a Poisson process having rate X 

P{N(t) = k] = e- u< ^-, £ = 0,1,... 

That is, the number of events in any interval of length t has a Poisson distribution with 
mean Xt. 

For a Poisson process, let X\ denote the time of the first event. Further, for n > 1, 
let X n denote the elapsed time between (n — l)st and the «th events. The sequence 
[X„, n = 1, 2, . . .} is called the sequence of interarrival times. For instance, if X\ = 5 and 
X2 = 10, then the first event of the Poisson process would have occurred at time 5 and 
the second at time 15. 

We now determine the distribution of the X n . To do so, we first note that the event 
{X\ > t] takes place if and only if no events of the Poisson process occur in the interval 
[0, i\ and thus, 

P{Xi > t\ = P{N{t) = 0} = e~ Xt 

Hence, X\ has an exponential distribution with mean 1/A.. To obtain the distribution of 
Xi, note that 

P{X 2 > t\Xi =s} = P{0 events in (s,s + t]\X\ = s} 
= P{0 events in (s, s + t]} 

= e -u 

where the last two equations followed from independent and stationary increments. There- 
fore, from the foregoing we conclude that X2 is also an exponential random variable with 
mean l/X, and furthermore, that X2 is independent ofXi. Repeating the same argument 
yields: 

PROPOSITION 5.6.3 X\, Xj, . . . are independent exponential random variables each with 
mean 1/A. 
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*5.7 THE GAMMA DISTRIBUTION 

A random variable is said to have a gamma distribution with parameters (a, A.), A, > 0, 
a > 0, if its density function is given by 



/(*) = 



^Jf/1 .ACK — 1 



Xe~ Ax {Xx) 



r» 







x > 
x < 



^here 



/■OO 

r(o) = / A^^Ax)"- 1 dx 
Jo 

POO 

= I e~ y y a ~ dy (by letting y = Xx) 
Jo 

The integration by parts formula / udv = uv — f v du yields, with u =y a ~ i ,dv = e~ y dy, 
v = —e~ y , that for a > 1, 



f 

Jo 



-y y a - l d y 



-y a— 1 



y = oo 
y = 

OQ 



L 



+ \ e~Ha - \)y a - z dy 



poo 

= (a - 1) / e-yy a ~ 2 dy 
Jo 



or 



r(a) = (a - l)r(a - 1) (5.7.1) 

When a is an integer — say, a = n — we can iterate the foregoing to obtain that 

r(») = (»- i)r(»- i) 

= (n — \){n — 2)r{n — 2) by letting a = n — 1 in Eq. 5.7.1 

= (n — 1)(« — 2)(« — 3)T(« — 3) by letting a = n — 2 in Eq. 5.7.1 

= (»-i)!r(i) 



-jf 

Jo 



r(i)= / e~y dy= 1 
h 
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we see that 

r(») = (»- i)! 

The function r(a) is called the gamma function. 

It should be noted that when a = 1, the gamma distribution reduces to the exponential 
with mean XIX. 

The moment generating function of a gamma random variable X with parameters (a, X) 
is obtained as follows: 

<Kt) = E[e tX ] 

r» Jo 

i a poo 

= ^— e-^-^x"- 1 dx 

r(a) Jo 

/ X \ a 1 Z" 00 

= {—,) ml. '~' f " dy b*, -o.- m 



x 



a — t y 

Differentiation of Equation 5.7.2 yields 

aX a 

4>'(t) = 
<P"(t) = 



(5.7.2) 



(X - t) a + l 
a(a + l)X a 



Hence, 



(A- 



E[X] = 0'(O) = j (5.7.3) 

Var(X) = E[X 2 ] - (E[X]) 2 

J, 
a(a +1) a 2 a 
X~ 2 1? ~ X? 



-™-<$ 



(5.7.4) 



An important property of the gamma is that if X\ and Xi are independent gamma 
random variables having respective parameters (a\,X) and (q!2>^)> then X\ + Xi is a 
gamma random variable with parameters (a\ + aj_, X). This result easily follows since 

M2 M = Ele^+^h (5.7.5) 
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k 



o-l 



k 



k — t } \k — t 

j \ 0-1+0-2 



0-2 



k — t 



from Equation 5.7.2 



which is seen to be the moment generating function of a gamma {a,\ + oi2,k) random 
variable. Since a moment generating function uniquely characterizes a distribution, the 
result entails. 

The foregoing result easily generalizes to yield the following proposition. 

PROPOSITION 5.7.1 If X}, i = 1, . . . , n are independent gamma random variables with 
respective parameters (a,-, k), then X7=i %-i ls gamma with parameters Yli=l a t' ^ m 
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FIGURE 5.I I Graphs of the gamma (a, \) density for (a) a = .5, 2, 3, 4, 5 and (b) a = 50. 



5.8 Distributions Arising from the Normal I8S 



Since the gamma distribution with parameters (1, A) reduces to the exponential with 
the rate A, we have thus shown the following useful result. 

Corollary 5.7.2 

If X\ , . . . , X„ are independent exponential random variables, each having rate A., then 
^2" = i Xi is a gamma random variable with parameters (n, A). 

EXAMPLE 5.7a The lifetime of a battery is exponentially distributed with rate A. If a stereo 
cassette requires one battery to operate, then the total playing time one can obtain from a 
total of n batteries is a gamma random variable with parameters («, A). ■ 

Figure 5.11 presents a graph of the gamma {a, 1) density for a variety of values of a. It 
should be noted that as a becomes large, the density starts to resemble the normal density. 
This is theoretically explained by the central limit theorem, which will be presented in the 
next chapter. 

5.8 DISTRIBUTIONS ARISING FROM THE NORMAL 
5.8.1 The Chi-Square Distribution 

Definition 

\tZ\,Zi, . . . ,Z n are independent standard normal random variables, thenX, defined by 

X = Z 2 +Z 2 2 + ---+Z 2 (5.8.1) 

is said to have a chi-square distribution with n degrees of freedom. We will use the notation 

to signify thatX has a chi-square distribution with n degrees of freedom. 

The chi-square distribution has the additive property that ifXj and X2 are independent 
chi-square random variables with n\ and «2 degrees of freedom, respectively, thenXi +X2 
is chi-square with n\ + «2 degrees of freedom. This can be formally shown either by the 
use of moment generating functions or, most easily, by noting thatXi + X2 is the sum of 
squares of n\ + ni independent standard normals and thus has a chi-square distribution 
with n\ + «2 degrees of freedom. 

If X is a chi-square random variable with n degrees of freedom, then for any a £ (0, 1), 
the quantity Xa « ls defined to be such that 

P{X > xlJ = a 

This is illustrated in Figure 5.12. 
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FIGURE 5.12 The chi-square density function with 8 degrees offreedo. 



Area = a 



In Table A2 of the Appendix, we list x 2 « f° r a variety of values of a and n (including 
all those needed to solve problems and examples in this text). In addition, Programs 5.8. la 
and 5.8.1b on the text disk can be used to obtain chi-square probabilities and the values 

EXAMPLE 5.8a Determine -P{x 2 6 — 30} when x 2l 5 is a chi-square random variable with 
26 degrees of freedom. 

SOLUTION Using Program 5.8.1a gives the result 

P{X26 < 30} = -7325 ■ 
EXAMPLE 5.8b Find x 05 15- 
SOLUTION Use Program 5.8.1b to obtain: 

X.05,15 = 24 " 996 " 

EXAMPLE 5.8c Suppose that we are attempting to locate a target in three-dimensional 
space, and that the three coordinate errors (in meters) of the point chosen are independent 
normal random variables with mean and standard deviation 2. Find the probability that 
the distance between the point chosen and the target exceeds 3 meters. 

SOLUTION If D is the distance, then 

z? 2 =x 1 2 +x?+x 3 2 

where X; is the error in the rth coordinate. Since Zj = Xi/2, i = 1, 2, 3, are all standard 
normal random variables, it follows that 

P[D 2 > 9} = P{Z\ + Z 2 + Z 2 > 9/4} 

= P{xl > 9/4} 
= .5222 

where the final equality was obtained from Program 5.8.1a. ■ 
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*5.8.l.l THE RELATION BETWEEN CHI-SQUARE AND GAMMA RANDOM VARIABLES 

Let us compute the moment generating function of a chi-square random variable with n 
degrees of freedom. To begin, we have, when n = 1, that 



tXi 



E[S*] = 



-I 



E[e tZ ] where Z ~ Af(0, 1) 
e tx fz(x) dx 

= jl r s,-* 12 d X 

y/ln J-oo 

1 f°° 2/T2 , , 

= — = / e~ x l2a dx where d 2 = (1 - It)' 1 

V27T J— oo 

1 r°< 

= (l-2 t )- U2 ^=- / 



* lml dx 



= (1-2?) 



-1/2 



(5.8.2) 



where the last equality follows since the integral of the normal (0, a 2 ) density equals 1. 
Hence, in the general case of n degrees of freedom 



E[e tX ] = E[e^^ Z f] 



= E 



H 

L«=l 



tZf 



= I I E\e* »] by independence of the Zj 



i=i 



= (1 - 2t)~ nU from Equation 5.8.2 



However, we recognize [1/(1 — 2t)] as being the moment generating function of a 
gamma random variable with parameters {nil, 1/2). Hence, by the uniqueness of moment 
generating functions, it follows that these two distributions — chi-square with n degrees 
of freedom and gamma with parameters nil and 1/2 — are identical, and thus we can 
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FIGURE 5.13 The chi-square density function with n degrees of freedom. 



conclude that the density of X is given by 



/(*) = 



k" 2 © 



X\(»/2)-l 



r (§) 



x > 



The chi-square density functions having 1, 3, and 10 degrees of freedom, respectively, 
are plotted in Figure 5.13. 

Let us reconsider Example 5.8c, this time supposing that the target is located in the 
two-dimensional plane. 

EXAMPLE 5.8d When we attempt to locate a target in two-dimensional space, suppose that 
the coordinate errors are independent normal random variables with mean and standard 
deviation 2. Find the probability that the distance between the point chosen and the target 
exceeds 3. 

SOLUTION If D is the distance and Xj, i = 1,2 are the coordinate errors, then 

D 2 =X?+X^ 

Since Z t = Xj/2, i = 1,2, are standard normal random variables, we obtain 

P{D 2 > 9} = P{Z 2 +Z 2 > 91 A} = P{xj > 91 A] = e~ m % .3247 

where the preceding calculation used the fact that the chi-square distribution with 
2 degrees of freedom is the same as the exponential distribution with parameter 1/2. ■ 

Since the chi-square distribution with n degrees of freedom is identical to the gamma 
distribution with parameters a = nil and A = 1/2, it follows from Equations 5.7.3 



S.8 Distributions Arising from the Normal 



189 



and 5.7.4 that the mean and variance of a random variable X having this distribution is 

E[X] = n, VarQO = In 

5.8.2 The ^-Distribution 

If Z and Xn are independent random variables, with Z having a standard normal distribu- 
tion and Xn having a chi-square distribution with n degrees of freedom, then the random 
variable T„ defined by 



T — 



>[W* 



is said to have a t-distribution with n degrees of freedom. A graph of the density function of 
T n is given in Figure 5.14 for n = 1,5, and 10. 

Like the standard normal density, the ^-density is symmetric about zero. In addition, as n 
becomes larger, it becomes more and more like a standard normal density. To understand 
why, recall that x„ can be expressed as the sum of the squares of n standard normals, 
and so 



X 2 



Zf + 



+ zl 



where Z\ , . . . , Z n are independent standard normal random variables. It now follows from 
the weak law of large numbers that, for large », x^/w will, with probability close to 1, 
be approximately equal to E\Zf\ = 1. Hence, for n large, T n = Z/y/x^/n will have 
approximately the same distribution as Z. 

Figure 5.15 shows a graph of the ^-density function with 5 degrees of freedom 
compared with the standard normal density. Notice that the ^-density has thicker "tails," 
indicating greater variability, than does the normal density. 





"V y- n= 10 




\\^ n = 5 




vX" = 1 



FIGURE 5.I4 Density function ofT n . 



190 



Chapter 5: Special Random Variables 



0.4 
0.3 
0.2 
0.1 



■ f-density with 5 degrees of freedom 

■ Standard normal density 




FIGURE 5. 1 5 Comparing standard normal density with the density o/Tc, . 



The mean and variance of T n can be shown to equal 



E[T„] = 0, n > 1 

n 



Var(r„) = 



n — 2 



n > 2 



Thus the variance of T n decreases to 1 — the variance of a standard normal random 
variable — as n increases to OQ. For a, < a < 1, let t a , n be such that 

P{T„ > t a> „] = a 

It follows from the symmetry about zero of the ^-density function that — T n has the same 
distribution as T n , and so 



a =P[-T„ > t a , n ) 

= 1 \i n _ ta,n\ 

= 1 - P{T n > -ta, n } 

P{T„ > -t a ,„} = 1 -a 

ta,n =; t\—a,n 



Therefore, 



leading to the conclusion that 



which is illustrated in Figure 5.16. 

The values of t a> „ for a variety of values of n and a have been tabulated in Table A3 
in the Appendix. In addition, Programs 5.8.2a and 5.8.2b on the text disk compute the 
^-distribution function and the values t a>n , respectively. 
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Area = a 




FIGURE 5.16 t\— a n = — ta n- 



Area = a 



~'a, n — h-a, n 'a, n 



EXAMPLE 5.8e Find (a) P{T U < 1.4} and (b) £.025,9. 

SOLUTION Run Programs 5.8.2a and 5.8.2b to obtain the results. 

(a) .9066 (b) 2.2625 ■ 

5.8.3 The /^-Distribution 

If x„ and Xm are independent chi-square random variables with n and m degrees of freedom, 
respectively, then the random variable F n ,m defined by 



p — 



Xllm 



is said to have an F-distribution with n and m degrees of freedom. 
For any a € (0, 1), let F a ^ m be such that 

This is illustrated in Figure 5.17. 

The quantities F a ,n,m are tabulated in Table A4 of the Appendix for different values 
of n, m, and a < ^. If F a> „ itn is desired when a > =, it can be obtained by using the 




Area = a 



FIGURE 5.17 Density function ofF n ^ m . 
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following equalities: 



a -pi*n!l >F 

Xm ,m 

_ p I ^ < * 



= l-P 






Xn' n ■MK > »,« 



or, equivalently, 



P{X>> 1 } = 1 _ a ( 5. 8 .3) 

But because {Xm^ m ^^X^ n ) has an ^-distribution with degrees of freedom m and », it 
follows that 

l - a=P \^> Fl _ am 



implying, from Equation 5.8.3, that 

1 



p 



— "\—a,m,n 



Thus, for instance, -f.9,5,7 = IIF \j^ = 1/3.37 = .2967 where the value of .F.1,7,5 was 
obtained from Table A4 of the Appendix. 

Program 5.8.3 computes the distribution function of F n>m . 

EXAMPLE 5.8f Determine P{Fe,\A < L 5}- 

SOLUTION Run Program 5.8.3 to obtain the solution .7518. ■ 

*5.9 THE LOGISTICS DISTRIBUTION 

A random variable Xis said to have a logistics distribution with parameters p and v > if 
its distribution function is 

e (x-fi)/v 

Fix) = ; rj-, —OO < X < OO 

I _|_ e (x-ll)lv 

* Optional section. 
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Differentiating F(x) = 1 — 1/(1 + e^ x ~ l ^ lv ) yields the density function 

e (x-ll)lv 
f(x) = — -. rr—r, —OO < X < OO 

To obtain the mean of a logistics random variable, 

„{x-ix)lv 



/OO 
-oo V{\ 



e K± 

(x—n)lv\2 



make the substitution y = (x — \x)lv. This yields 



+ e y ) 2 



f°° ye y 

where the preceding equality used that e y l{{\ + e y ) ) is the density function of a logistic 
random variable with parameters /x = 0, v = 1 (such a random variable is called a standard 
logistic) and thus integrates to 1 . Now, 

f°° xe~ x f°° ye y 

= -Jo aT^ dx + L aTW dy 

f°° xe x f°° ye y 

= (5.9.2) 

where the second equality is obtained by making the substitution x = —y, and the third 
by multiplying the numerator and denominator by e . From Equations 5.9.1 and 5.9.2 
we obtain 

E[X] = n 

Thus fi is the mean of the logistic; v is called the dispersion parameter. 
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Problems 

1. A satellite system consists of 4 components and can function adequately if at 
least 2 of the 4 components are in working condition. If each component is, 
independently, in working condition with probability .6, what is the probability 
that the system functions adequately? 

2. A communications channel transmits the digits and 1. However, due to static, 
the digit transmitted is incorrectly received with probability .2. Suppose that we 
want to transmit an important message consisting of one binary digit. To reduce 
the chance of error, we transmit 00000 instead of and 11111 instead of 1 . If the 
receiver of the message uses "majority" decoding, what is the probability that the 
message will be incorrectly decoded? What independence assumptions are you 
making? (By majority decoding we mean that the message is decoded as "0" if 
there are at least three zeros in the message received and as " 1 " otherwise.) 

3. If each voter is for Proposition A with probability .7, what is the probability that 
exactly 7 of 10 voters are for this proposition? 

4. Suppose that a particular trait (such as eye color or left-handedness) of a person 
is classified on the basis of one pair of genes, and suppose that d represents a 
dominant gene and r a recessive gene. Thus, a person with dd genes is pure 
dominance, one with rr is pure recessive, and one with rd is hybrid. The pure 
dominance and the hybrid are alike in appearance. Children receive 1 gene from 
each parent. If, with respect to a particular trait, 2 hybrid parents have a total 
of 4 children, what is the probability that 3 of the 4 children have the outward 
appearance of the dominant gene? 

5. At least one-half of an airplane's engines are required to function in order for it 
to operate. If each engine independently functions with probability p, for what 
values of/> is a 4-engine plane more likely to operate than a 2-engine plane? 

6. Let X be a binomial random variable with 

E[X] = 7 and Var(X) = 2.1 

Find 

(a) P{X = 4}; 

(b) P{X > 12}. 

7. If X and Y are binomial random variables with respective parameters (n,p) and 
(n, 1 — p), verify and explain the following identities: 

(a) P{X <i}=P{Y > n-i); 
(a) P{X = k}=P{Y = n-k}. 
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8. If X is a binomial random variable with parameters n and/>, where < p < 1, 
show that 

(a) P{X = k+\} = - J —- P{X = k},k = 0, 1,...,»- 1. 

1 — p k + 1 

(b) As £ goes from to n, P{X = k] first increases and then decreases, reaching 
its largest value when k is the largest integer less than or equal to (n + \)p. 

9. Derive the moment generating function of a binomial random variable and then 
use your result to verify the formulas for the mean and variance given in the text. 

10. Compare the Poisson approximation with the correct binomial probability for 
the following cases: 

(a) P{X = 2} when n = 10,/> = .1; 

(b) P{X = 0} when n = 10, p = .1; 

(c) P{X = 4} when n = 9,p = .2. 

If you buy a lottery ticket in 50 lotteries, in each of which your chance of winning 
a prize is ™, what is the (approximate) probability that you will win a prize (a) 
at least once, (b) exactly once, and (c) at least twice? 



11 



12. The number of times that an individual contracts a cold in a given year is a Poisson 
random variable with parameter A, = 3. Suppose a new wonder drug (based on 
large quantities of vitamin C) has just been marketed that reduces the Poisson 
parameter to A. = 2 for 75 percent of the population. For the other 25 percent of 
the population, the drug has no appreciable effect on colds. If an individual tries 
the drug for a year and has colds in that time, how likely is it that the drug is 
beneficial for him or her? 

13. In the 1980s, an average of 121.95 workers died on the job each week. Give 
estimates of the following quantities: 

(a) the proportion of weeks having 130 deaths or more; 

(b) the proportion of weeks having 100 deaths or less. 

Explain your reasoning. 

14. Approximately 80,000 marriages took place in the state of New York last year. 
Estimate the probability that for at least one of these couples 

(a) both partners were born on April 30; 

(b) both partners celebrated their birthday on the same day of the year. 

State your assumptions. 

15. The game of frustration solitaire is played by turning the cards of a randomly 
shuffled deck of 52 playing cards over one at a time. Before you turn over the 
first card, say ace; before you turn over the second card, say two, before you turn 
over the third card, say three. Continue in this manner (saying ace again before 
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turning over the fourteenth card, and so on.) You lose if you ever turn over a card 
that matches what you have just said. Use the Poisson paradigm to approximate 
the probability of winning. (The actual probability is .01623.) 

16. The probability of error in the transmission of a binary digit over a communication 
channel is 1/10 . Write an expression for the exact probability of more than 3 
errors when transmitting a block of 10 bits. What is its approximate value? 
Assume independence. 

17. l£X is a Poisson random variable with mean X, show that P{X = i ' } first increases 
and then decreases as i increases, reaching its maximum value when i is the largest 
integer less than or equal to X. 

18. A contractor purchases a shipment of 100 transistors. It is his policy to test 10 
of these transistors and to keep the shipment only if at least 9 of the 10 are in 
working condition. If the shipment contains 20 defective transistors, what is the 
probability it will be kept? 

19. Let X denote a hypergeometric random variable with parameters n, m, and k. 
That is, 



(I) 



in 



v — i 
P{X = i] = — -. r — , i = 0, 1, . . . , min(£, n) 



(a) Derive a formula for P{X = i] in terms of P{X = i — 1}. 

(b) Use part (a) to compute P{X = i} for i — 0, 1, 2, 3, 4, 5 when n = m = 10, 
k = 5, by starting with P{X = 0}. 

(c) Based on the recursion in part (a), write a program to compute the 
hypergeometric distribution function. 

(d) Use your program from part (c) to compute P{X < 10} when n = m = 30, 
k= 15. 

20. Independent trials, each of which is a success with probability/), are successively 
performed. Let X denote the first trial resulting in a success. That is, Xwill equal 
k if the first k — 1 trials are all failures and the £th a success. Xis called a geometric 
random variable. Compute 

(a) P{X = k},k=l,2,...; 

(b) E[X\. 

Let Y denote the number of trials needed to obtain r successes. Y is called a 
negative binomial random variable. Compute 

(c) P{Y = k},k = r,r+ 1,.... 
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{Hint: In order for Fto equal k, how many successes must result in the first k—l 
trials and what must be the outcome of trial kTj 

(d) Show that 

E[Y] = rip 

{Hint: Write Y = Y\ + . . . + Y r where Yj is the number of trials needed to go 
from a total of i — 1 to a total of i successes.) 

21. If U is uniformly distributed on (0, 1), show that a + {b — a)U is uniform 
on {a, b). 

22. You arrive at a bus stop at 10 o'clock, knowing that the bus will arrive at some 
time uniformly distributed between 10 and 10:30. What is the probability that 
you will have to wait longer than 10 minutes? If at 10:15 the bus has not yet 
arrived, what is the probability that you will have to wait at least an additional 
10 minutes? 

23. \iX is a normal random variable with parameters /x = 10, a — 36, compute 

(a) P{X > 5}; 

(b) P{4 < X < 16}; 

(c) P{X < 8}; 

(d) P[X < 20}; 

(e) P{X > 16}. 

24. The Scholastic Aptitude Test mathematics test scores across the population of 
high school seniors follow a normal distribution with mean 500 and standard 
deviation 100. If five seniors are randomly chosen, find the probability that 
(a) all scored below 600 and (b) exactly three of them scored above 640. 

25. The annual rainfall (in inches) in a certain region is normally distributed with 
fi = 40, cr=4. What is the probability that in 2 of the next 4 years the rainfall 
will exceed 50 inches? Assume that the rainfalls in different years are independent. 

26. The width of a slot of a duralumin forging is (in inches) normally distributed with 
IX = .9000 and a = .0030. The specification limits were given as .9000 ± .0050. 
What percentage of forgings will be defective? What is the maximum allowable 
value of a that will permit no more than 1 in 100 defectives when the widths are 
normally distributed with fi = .9000 and a = .0030? 

27. A certain type of lightbulb has an output that is normally distributed with mean 
2,000 end foot candles and standard deviation 85 end foot candles. Determine 
a lower specification limit L so that only 5 percent of the lightbulbs produced 
will be defective. (That is, determine L so that P{X > L] = .95, where Xis the 
output of a bulb.) 

28. A manufacturer produces bolts that are specified to be between 1.19 and 
1.21 inches in diameter. If its production process results in a bolt's diameter 
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being normally distributed with mean 1.20 inches and standard deviation .005, 
what percentage of bolts will not meet specifications? 

29. Letl = f^ 00 e- x2l2 dx. 

(a) Show that for any fi and a 

-L- f°° e-^ n ° 2 dx=l 

is equivalent to / = y/ln . 

(b) Show that / = V27T by writing 

/OO fOO POO fCO 

e- x2/2 dx e-y 2l2 dy= / e'^^ 12 dx dy 

-OO J —oo J — oo J — oo 

and then evaluating the double integral by means of a change of variables to 
polar coordinates. (That is, let x = r cos 6,y = rs'm8,dx dy = r dr dO.) 

30. A random variable X is said to have a lognormal distribution if logX is nor- 
mally distributed. If X\s lognormal with i?[logX] = \jl and Var(logX) = a , 
determine the distribution function of X That is, what is P{X < x}? 

31. The lifetimes of interactive computer chips produced by a certain semiconductor 
manufacturer are normally distributed having mean 4.4 x 10 hours with a 
standard deviation of 3 x 10 hours. If a mainframe manufacturer requires that 
at least 90 percent of the chips from a large batch will have lifetimes of at least 
4.0 x 10 hours, should he contract with the semiconductor firm? 

32. In Problem 31, what is the probability that a batch of 100 chips will contain at 
least 4 whose lifetimes are less than 3.8 x 10 hours? 

33. The lifetime of a color television picture tube is a normal random variable with 
mean 8.2 years and standard deviation 1.4 years. What percentage of such tubes 
lasts 

(a) more than 10 years; 

(b) less than 5 years; 

(c) between 5 and 10 years? 

34. The annual rainfall in Cincinnati is normally distributed with mean 40. 14 inches 
and standard deviation 8.7 inches. 

(a) What is the probability this year's rainfall will exceed 42 inches? 

(b) What is the probability that the sum of the next 2 years' rainfall will exceed 
84 inches? 

(c) What is the probability that the sum of the next 3 years' rainfall will exceed 
126 inches? 

(d) For parts (b) and (c), what independence assumptions are you making? 
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35. The height of adult women in the United States is normally distributed with 
mean 64.5 inches and standard deviation 2.4 inches. Find the probability that 
a randomly chosen woman is 

(a) less than 63 inches tall; 

(b) less than 70 inches tall; 

(c) between 63 and 70 inches tall. 

(d) Alice is 72 inches tall. What percentage of women is shorter than Alice? 

(e) Find the probability that the average of the heights of two randomly chosen 
women exceeds 66 inches. 

(f ) Repeat part (e) for four randomly chosen women. 

36. An IQ test produces scores that are normally distributed with mean value 100 
and standard deviation 14.2. The top 1 percent of all scores are in what range? 

37. The time (in hours) required to repair a machine is an exponentially distributed 
random variable with parameter X = 1 . 

(a) What is the probability that a repair time exceeds 2 hours? 

(b) What is the conditional probability that a repair takes at least 3 hours, given 
that its duration exceeds 2 hours? 

38. The number of years a radio functions is exponentially distributed with parameter 
A = g . If Jones buys a used radio, what is the probability that it will be working 
after an additional 10 years? 

39. Jones figures that the total number of thousands of miles that a used auto can be 
driven before it would need to be junked is an exponential random variable with 
parameter ^j. Smith has a used car that he claims has been driven only 10,000 
miles. If Jones purchases the car, what is the probability that she would get at least 
20,000 additional miles out of it? Repeat under the assumption that the lifetime 
mileage of the car is not exponentially distributed but rather is (in thousands of 
miles) uniformly distributed over (0, 40). 

*40. LetXi,X2, . . . ,X„ denote the first n interarrival times of a Poisson process and 
set S„ = 2-,j = \X}. 

(a) What is the interpretation of 5 H ? 

(b) Argue that the two events {S„ < t} and [N(t) > n\ are identical. 

(c) Use part (b) to show that 

K-l 

p{s n < t} = i - J2 e- Xt {xtyij\ 

(d) By differentiating the distribution function of S„ given in part (c), conclude 
that S n is a gamma random variable with parameters n and A. (This result 
also follows from Corollary 5.7.2.) 

From optional sections. 
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*4l. Earthquakes occur in a given region in accordance with a Poisson process with 
rate 5 per year. 

(a) What is the probability there will be at least two earthquakes in the first half 
of 2010? 

(b) Assuming that the event in part (a) occurs, what is the probability that there 
will be no earthquakes during the first 9 months of 201 1? 

(c) Assuming that the event in part (a) occurs, what is the probability that there 
will be at least four earthquakes over the first 9 months of the year 2010? 

*42. When shooting at a target in a two-dimensional plane, suppose that the horizontal 
miss distance is normally distributed with mean and variance 4 and is indepen- 
dent of the vertical miss distance, which is also normally distributed with mean 
and variance 4. Let D denote the distance between the point at which the shot 
lands and the target. 
Find E[D]. 

43. IfXis a chi-square random variable with 6 degrees of freedom, find 

(a) P{X < 6}; 

(b) P{3 < X < 9}. 

44. If X and Fare independent chi-square random variables with 3 and 6 degrees of 
freedom, respectively, determine the probability thatX + Y will exceed 10. 

45. Show that T(l/2) = ^/jt (Hint. Evaluate f^° e~ x x~ 112 dx by letting x = y 2 /2, 
dx = y dy.) 

46. If T has a ^-distribution with 8 degrees of freedom, find (a) P{T > 1}, 
(b) P{T < 2}, and (c) P{-\ < T < 1}. 

47. If T„ has a ^-distribution with n degrees of freedom, show that T 2 has an 
^-distribution with 1 and n degrees of freedom. 

48. Let <1> be the standard normal distribution function. If, for constants a 
and b > 

P{X < x] = O 
characterize the distribution of X. 



* From optional sections. 




DISTRIBUTIONS OF SAMPLING 
STATISTICS 



6.1 INTRODUCTION 

The science of statistics deals with drawing conclusions from observed data. For instance, 
a typical situation in a technological study arises when one is confronted with a large 
collection, or population, of items that have measurable values associated with them. By 
suitably sampling from this collection, and then analyzing the sampled items, one hopes 
to be able to draw some conclusions about the collection as a whole. 

To use sample data to make inferences about an entire population, it is necessary to 
make some assumptions about the relationship between the two. One such assumption, 
which is often quite reasonable, is that there is an underlying (population) probability 
distribution such that the measurable values of the items in the population can be thought 
of as being independent random variables having this distribution. If the sample data 
are then chosen in a random fashion, then it is reasonable to suppose that they too are 
independent values from the distribution. 

Definition 

If X\, . . . ,X n are independent random variables having a common distribution F, then 
we say that they constitute a sample (sometimes called a random sample) from the 
distribution F. 

In most applications, the population distribution ^will not be completely specified and 
one will attempt to use the data to make inferences about F. Sometimes it will be supposed 
that F is specified up to some unknown parameters (for instance, one might suppose that 
F was a normal distribution function having an unknown mean and variance, or that it 
is a Poisson distribution function whose mean is not given), and at other times it might 
be assumed that almost nothing is known about F (except maybe for assuming that it is 
a continuous, or a discrete, distribution). Problems in which the form of the underlying 
distribution is specified up to a set of unknown parameters are called parametric inference 
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problems, whereas those in which nothing is assumed about the form of F are called 
nonparametric inference problems. 

EXAMPLE 6.1a Suppose that a new process has just been installed to produce computer 
chips, and suppose that the successive chips produced by this new process will have useful 
lifetimes that are independent with a common unknown distribution F. Physical reasons 
sometimes suggest the parametric form of the distribution F; for instance, it may lead us 
to believe that F\s& normal distribution, or that ins an exponential distribution. In such 
cases, we are confronted with a parametrical statistical problem in which we would want 
to use the observed data to estimate the parameters of F. For instance, if F were assumed 
to be a normal distribution, then we would want to estimate its mean and variance; iff 7 
were assumed to be exponential, we would want to estimate its mean. In other situations, 
there might not be any physical justification for supposing that Fhas any particular form; 
in this case the problem of making inferences about F would constitute a nonparametric 
inference problem. ■ 

In this chapter, we will be concerned with the probability distributions of certain 
statistics that arise from a sample, where a statistic is a random variable whose value is 
determined by the sample data. Two important statistics that we will discuss are the sample 
mean and the sample variance. In Section 6.2, we consider the sample mean and derive 
its expectation and variance. We note that when the sample size is at least moderately 
large, the distribution of the sample mean is approximately normal. This follows from 
the central limit theorem, one of the most important theoretical results in probability, 
which is discussed in Section 6.3. In Section 6.4, we introduce the sample variance and 
determine its expected value. In Section 6.5, we suppose that the population distribution 
is normal and present the joint distribution of the sample mean and the sample variance. 
In Section 6.6, we suppose that we are sampling from a finite population of elements and 
explain what it means for the sample to be a "random sample." When the population size 
is large in relation to the sample size, we often treat it as if it were of infinite size; this is 
illustrated and its consequences are discussed. 

6.2 THE SAMPLE MEAN 

Consider a population of elements, each of which has a numerical value attached to it. 
For instance, the population might consist of the adults of a specified community and the 
value attached to each adult might be his or her annual income, or height, or age, and so 
on. We often suppose that the value associated with any member of the population can 
be regarded as being the value of a random variable having expectation /x and variance 
a - . The quantities /x and a are called the population mean and the population variance, 
respectively. Let X\,Xi, . . . ,X„ be a sample of values from this population. The sample 
mean is defined by 

X X + ---+X„ 
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Since the value of the sample mean X is determined by the values of the random variables 
in the sample, it follows that X is also a random variable. Its expected value and variance 
are obtained as follows: 



E[X] = E 



X 1 +---+X„ 



and 



Var(JO = Var 



= -{E[Xx\ + 

n 

-IX 



X x + ---+X„ 



+ E\Xn\) 



= — [Var(Xi) + • ■ ■ + Var(JQ] by independence 



no 



where \x and a are the population mean and variance, respectively. Hence, the expected 
value of the sample mean is the population mean fi whereas its variance is I In times 
the population variance. As a result, we can conclude that X is also centered about the 
population mean jjl, but its spread becomes more and more reduced as the sample size 
increases. Figure 6.1 plots the probability density function of the sample mean from 
a standard normal population for a variety of sample sizes. 




FIGURE 6.1 Densities of sample means from a standard normal population. 
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6.3 THE CENTRAL LIMIT THEOREM 

In this section, we will consider one of the most remarkable results in probability — 
namely, the central limit theorem. Loosely speaking, this theorem asserts that the sum of 
a large number of independent random variables has a distribution that is approximately 
normal. Hence, it not only provides a simple method for computing approximate prob- 
abilities for sums of independent random variables, but it also helps explain the remarkable 
fact that the empirical frequencies of so many natural populations exhibit a bell-shaped 
(that is, a normal) curve. 

In its simplest form, the central limit theorem is as follows: 

Theorem 6.3.1 The Central Limit Theorem 

Let X\,X2,---,X n be a sequence of independent and identically distributed random 
variables each having mean /x and variance a . Then for n large, the distribution of 

is approximately normal with mean «/x and variance no . 
It follows from the central limit theorem that 

X\-\ \-X n — n/jb 

Gyfn 

is approximately a standard normal random variable; thus, for n large, 

\X x + ---+X n -nii 1 
P \ = < x \ «a P{Z < x) 

I ° W" J 

where .Zis a standard normal random variable. 

EXAMPLE 6.3a An insurance company has 25,000 automobile policy holders. If the yearly 
claim of a policy holder is a random variable with mean 320 and standard deviation 540, 
approximate the probability that the total yearly claim exceeds 8.3 million. 

SOLUTION Let X denote the total yearly claim. Number the policy holders, and let Xi 
denote the yearly claim of policy holder i. With n = 25,000, we have from the central 
limit theorem that X = / ,;_i Xj will have approximately a normal distribution with 
mean 320 x 25,000 = 8 x 10 6 and standard deviation 540V25,000 = 8.5381 x 10 4 . 
Therefore, 



P{X > 8.3 x 10 6 } = P 



= P 



X - 


-8 x 


10 6 


8.5381 x 


10 4 


X- 


-8 x 


10 6 



8.3 x 10 6 - 8 x 10 6 
8.5381 x 10 4 



.3 x 10 6 



8.5381 x 10 4 8.5381 x 10 4 
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& P{Z > 3.51} where Z is a standard normal 
«s .00023 

Thus, there are only 2.3 chances out of 10,000 that the total yearly claim will exceed 
8.3 million. ■ 

EXAMPLE 6.3b Civil engineers believe that W, the amount of weight (in units of 
1,000 pounds) that a certain span of a bridge can withstand without structural dam- 
age resulting, is normally distributed with mean 400 and standard deviation 40. Suppose 
that the weight (again, in units of 1,000 pounds) of a car is a random variable with mean 
3 and standard deviation .3. How many cars would have to be on the bridge span for the 
probability of structural damage to exceed . 1 ? 

SOLUTION Let P n denote the probability of structural damage when there are n cars on the 
bridge. That is, 

P n = P{Xi + ---+X„> W} 

= P{X l + ---+X n - W>0] 

where X; is the weight of the z'th car, i = 1, . . . , n. Now it follows from the central 
limit theorem that X^=i-^ ' s approximately normal with mean 3» and variance .09». 
Hence, since Wis independent of the Xi, i = 1, . . . ,n, and is also normal, it follows that 
X);=i % ~ W is approximately normal, with mean and variance given by 




W 



= 3w - 400 



- W 1 = Var J2 X ' I + Var(W) = .09» + 1,600 
Therefore, if we let 



Z 



then 



J2 X '~ W r -(3»-400) 

i=\ 

V-09«+ 1,600 
(3» - 400) 



Pn=P\Z> 



V.09w+ 1,600 



where Z\s approximately a standard normal random variable. Now P{Z > 1.28} & .1, 
and so if the number of cars n is such that 

400 - 3« 

< 1.28 



V-09«+ 1,600 
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n> 117 
then there is at least 1 chance in 10 that structural damage will occur. ■ 

The central limit theorem is illustrated by Program 6.1 on the text disk. This program 
plots the probability mass function of the sum of n independent and identically distributed 
random variables that each take on one of the values 0, 1, 2, 3, 4. When using it, one 
enters the probabilities of these five values, and the desired value of n. Figures 6.2(a)-(f) 
give the resulting plot for a specified set of probabilities when n — 1,3, 5, 10, 25, 100. 

One of the most important applications of the central limit theorem is in regard to 
binomial random variables. Since such a random variable X having parameters {n,p) 
represents the number of successes in n independent trials when each trial is a success 
with probability/), we can express it as 

X=X x + ---+X n 



^here 



Xi = 



1 if the ith trial is a success 
otherwise 
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Enter the probabilities and number of random 
variables to be summed. The output gives the mass 
function of the sum along with its mean and 
variance. 
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FIGURE 6.2 (a) n=l,(b)n = 3, (c) n = 5,(d)n= 10, (e) n = 25, (f) n = 100. 
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Enter the probabilities and number of random 
variables to be summed. The output gives the mass 
function of the sum along with its mean and 
variance. 
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FIGURE 6.2 (continued) 
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Enter the probabilities and number of random 
variables to be summed. The output gives the mass 
function of the sum along with its mean and 
variance. 
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FIGURE 6.2 (continued) 

Because 

E[Xi]=p, V2i{Xi)=p{l-p) 
it follows from the central limit theorem that for n large 

X — np 



y/np{\ -p) 

will approximately be a standard normal random variable [see Figure 6.3, which graphically 
illustrates how the probability mass function of a binomial (n,p) random variable becomes 
more and more "normal" as n becomes larger and larger]. 

EXAMPLE 6.3c The ideal size of a first-year class at a particular college is 150 students. 
The college, knowing from past experience that, on the average, only 30 percent of those 
accepted for admission will actually attend, uses a policy of approving the applications of 
450 students. Compute the probability that more than 150 first-year students attend this 
college. 

SOLUTION Let X denote the number of students that attend; then assuming that each 
accepted applicant will independently attend, it follows that X is a binomial random 
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FIGURE 6.3 Binomial probability mass functions converging to the normal density. 



variable with parameters n = 450 and/> = .3. Since the binomial is a discrete and the 
normal a continuous distribution, it is best to compute P{X = i}asP{i — .5 <X < z'+.5} 
when applying the normal approximation (this is called the continuity correction). This 
yields the approximation 



P{X > 150.5} =P 



X - (450X.3) ^ 150.5 - (450)03) 

V450(.3)(.7) 



V450(.3)(.7) " 
P{Z > 1.59} = .06 



Hence, only 6 percent of the time do more than 150 of the first 450 accepted actually 
attend. ■ 

It should be noted that we now have two possible approximations to binomial proba- 
bilities: The Poisson approximation, which yields a good approximation when n is large 
and p small, and the normal approximation, which can be shown to be quite good when 
np{\ — p) is large. [The normal approximation will, in general, be quite good for values 
of n satisfying np(\ — p) > 10.] 

6.3. 1 Approximate Distribution of the Sample Mean 

LetJ^i, . . . ,X„ be a sample from a population having mean \x and variance a . The central 
limit theorem can be used to approximate the distribution of the sample mean 



X = J^Xih, 



;'=! 
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Since a constant multiple of a normal random variable is also normal, it follows from 
the central limit theorem that X will be approximately normal when the sample size n is 
large. Since the sample mean has expected value fi and standard deviation al^/n, it then 
follows that 

X- ix 

ol*Jn 

has approximately a standard normal distribution. 

EXAMPLE 6.3d The weights of a population of workers have mean 167 and standard 
deviation 27. 

(a) If a sample of 36 workers is chosen, approximate the probability that the sample 
mean of their weights lies between 163 and 170. 

(b) Repeat part (a) when the sample is of size 144. 

SOLUTION Let Z be a standard normal random variable. 

(a) It follows from the central limit theorem that X is approximately normal with 
mean 167 and standard deviation 27/v 36 = 4.5. Therefore, 

- , f 163 -167 X-167 170-167) 

f J -167 ) 

= P \ -.8889 < — < .8889 J 

ss 2P{Z < .8889} - 1 
% .6259 

(b) For a sample of size 144, the sample mean will be approximately normal with mean 
167 and standard deviation 27/v 144 = 2.25. Therefore, 

- , f 163 -167 X-167 170-167) 

P 163 < X < 170 = P \ < < } 

[ 2.25 2.25 2.25 J 

f J- 167 ) 

= P | -1.7778 < — — — < 1.7778 [ 

^2P{Z < 1.7778} - 1 
% .9246 

Thus increasing the sample size from 36 to 144 increases the probability from .6259 
to .9246. ■ 
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EXAMPLE 6.3e An astronomer wants to measure the distance from her observatory to 
a distant star. However, due to atmospheric disturbances, any measurement will not 
yield the exact distance d. As a result, the astronomer has decided to make a series of 
measurements and then use their average value as an estimate of the actual distance. If 
the astronomer believes that the values of the successive measurements are independent 
random variables with a mean of d light years and a standard deviation of 2 light years, 
how many measurements need she make to be at least 95 percent certain that her estimate 
is accurate to within ± .5 light years? 

SOLUTION If the astronomer makes n measurements, then X, the sample mean of these 
measurements, will be approximately a normal random variable with mean </and standard 
deviation 2/«/n. Thus, the probability that it will lie between d ± .5 is obtained as 
follows: 

P{-. 5< X-d<.5}=p\ — 5 -<^A<^-\ 

[2/V» 2/ 'V« 2/y^J 

« P{-s/TilA < Z < JTilA) 
= 2P{Z < V«/4} - 1 

where Z is a standard normal random variable. 

Thus, the astronomer should make n measurements, where n is such that 

2P{Z < yfiilA} - 1 > .95 

or, equivalently, 

P{Z < JnIA} > .975 
Since P{Z < 1.96} = .975, it follows that n should be chosen so that 

yfnIA > 1.96 

That is, at least 62 observations are necessary. ■ 

6.3.2 How Large a Sample Is Needed? 

The central limit theorem leaves open the question of how large the sample size n needs to be 
for the normal approximation to be valid, and indeed the answer depends on the population 
distribution of the sample data. For instance, if the underlying population distribution 
is normal, then the sample mean X will also be normal regardless of the sample size. A 
general rule of thumb is that one can be confident of the normal approximation whenever 
the sample size n is at least 30. That is, practically speaking, no matter how nonnormal 
the underlying population distribution is, the sample mean of a sample of size at least 30 
will be approximately normal. In most cases, the normal approximation is valid for much 
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FIGURE 6.4 Densities of the average ofn exponential random variables having mean 1. 



smaller sample sizes. Indeed, a sample of size 5 will often suffice for the approximation 
to be valid. Figure 6.4 presents the distribution of the sample means from an exponential 
population distribution for samples of sizes n = 1,5, 10. 

6.4 THE SAMPLE VARIANCE 

Let X\ , . . . , X n be a random sample from a distribution with mean /x and variance a 2 . Let 
X be the sample mean, and recall the following definition from Section 2.3.2. 



Definition 



The statistic S 2 , defined by 



HiXi-X) 2 

S 2 ='^ 

n — \ 

is called the sample variance. S = V S 2 is called the sample standard deviation. 

To compute E\S ], we use an identity that was proven in Section 2.3.2: For any 
numbers x\ , . . . , x„ 



5>-*) 2 = E 



2 —2 

X: — nx 



«'=1 



i-\ 
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where x = Yli=l x ^ n - It follows from this identity that 



(»- \)S 2 = ^Xf - nX 2 



Taking expectations of both sides of the preceding yields, upon using the fact that for any 
random variable W,E[W 2 ] = Var(W) + (E[W]) 2 , 



(n - l)E[S 2 ] = E 






nE[X 



= nE[X 2 ] - nE[X 2 ] 

= »Var(Xi) + n(E[Xi]) 2 - «Var(Z) - n{E[X]) 2 

= no + nil — n{a In) — nil 
= {n- \)a 2 



E[S 2 ] = a 2 

That is, the expected value of the sample variance S is equal to the population 
variance a . 



6.5 SAMPLING DISTRIBUTIONS FROM A 
NORMAL POPULATION 

Let ^1,^2, . . . , X n be a sample from a normal population having mean /x and variance a . 
That is, they are independent and Xi ~ A/(/z, a ), i = !,...,«. Also let 



x = J2 x ' /k 



i=l 



and 



T.(Xi-xy 



s 2 = i=] 



n — 1 

denote the sample mean and sample variance, respectively. We would like to compute 
their distributions. 
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6.5. 1 Distribution of the Sample Mean 

Since the sum of independent normal random variables is normally distributed, it follows 
that X is normal with mean 



" E[Xi\ 
E[X] = > = fi 

1 ' OT 



2=1 



and variance 

1 - 
Var(X) = — VVar(XJ) = a 2 It. 



2=1 



That is, X, the average of the sample, is normal with a mean equal to the population mean 
but with a variance reduced by a factor of I In. It follows from this that 

X- fi 
alyfn 

is a standard normal random variable. 

6.5.2 Joint Distribution of X and S 2 

In this section, we not only obtain the distribution of the sample variance 5 , but we also 
discover a fundamental fact about normal samples — namely, that X and 5 are indepen- 
dent with (n — l)5/cr having a chi-square distribution with n — \ degrees of freedom. 
To start, for numbers x\, . . . ,x n , let yi = x, — [A, i = 1, . . . , n. Then as y = x — fi, 
it follows from the identity 

n n 

5^(yi -J) 2 ' = ^yl - n f 

2=1 2=1 

that 

y^te- - x) 2 = J~l(xj - /x) 2 - n(x - n) 2 

2=1 2=1 

Now, if X\, . . . ,X„ is a sample from a normal population having mean /z variance o , 
then we obtain from the preceding identity that 



Etfi-M) 2 E«-x) 2 

2=1 2=1 , n{X - fi) 



2 



+ 



CT 2 (7 2 CT 2 
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or, equivalently, 



E 



Xi - /z\ 2 _ 


J2(X-X) 2 

1=1 + 

a 2 


y/n{X - fl) 
a 



(6.5.1) 



Because (JQ — fi)/a,i — 1, . . . , n are independent standard normals, it follows that the 
left side of Equation 6.5.1 is a chi-square random variable with n degrees of freedom. 
Also, as shown in Section 6.5.1, *Jn{X — fi)/a is a standard normal random vari- 
able and so its square is a chi-square random variable with 1 degree of freedom. Thus 
Equation 6.5.1 equates a chi-square random variable having n degrees of freedom to the 
sum of two random variables, one of which is chi-square with 1 degree of freedom. But it 
has been established that the sum of two independent chi-square random variables is also 
chi-square with a degree of freedom equal to the sum of the two degrees of freedom. Thus, 
it would seem that there is a reasonable possibility that the two terms on the right side of 
Equation 6.5. 1 are independent, with X^=i (X —X) la having a chi-square distribution 
with n — 1 degrees of freedom. Since this result can indeed be established, we have the 
following fundamental result. 

Theorem 6.5.1 

If X\, . . . ,X n is a sample from a normal population having mean /x and variance a ' , then 
X and S are independent random variables, with X being normal with mean fi and 
variance a In and (n — \)S la being chi-square with n — 1 degrees of freedom. 

Theorem 6.5. 1 not only provides the distributions of X and S for a normal population 
but also establishes the important fact that they are independent. In fact, it turns out 
that this independence of X and S is a unique property of the normal distribution. Its 
importance will become evident in the following chapters. 

EXAMPLE 6.5a The time it takes a central processing unit to process a certain type of 
job is normally distributed with mean 20 seconds and standard deviation 3 seconds. If 
a sample of 15 such jobs is observed, what is the probability that the sample variance will 
exceed 12? 

SOLUTION Since the sample is of size n = 15 and a — 9, write 

7 f \AS 2 14 
P{S 2 > 12} = P \ > — . 12 

= P{ X 2 lA > 18.67} 

= 1 - .8221 from Program 5.8.1a 

= .1779 ■ 
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The following corollary of Theorem 6.5.1 will be quite useful in the following chapters. 

Corollary 6.5.2 

Let Xi, . . . ,X n be a sample from a normal population with mean /x. If X denotes the 
sample mean and S the sample standard deviation, then 



(X - fi) 

i 

s 

That is, *Jn{X — fi)/S has a ^-distribution with n — \ degrees of freedom. 



>n - t„-i 



Proof 

Recall that a f-random variable with n degrees of freedom is defined as the distribution of 

Z 



where Z is a standard normal random variable that is independent of Xn > a chi-square 
random variable with n degrees of freedom. It then follows from Theorem 6.5.1 that 

</n()C-H)Jo ,-VC-ti) 

, — = Jn 

is a ^-random variable with n — \ degrees of freedom. □ 

6.6 SAMPLING FROM A FINITE POPULATION 

Consider a population of N elements, and suppose that p is the proportion of the 
population that has a certain characteristic of interest; that is, Np elements have this 
characteristic, and N{\— p) do not. A sample of size n from this population is said to 
be a random sample if it is chosen in such a manner that each of the ( ) population subsets 
of size n is equally likely to be the sample. For instance, if the population consists of the 
three elements a, b, c, then a random sample of size 2 is one that is chosen so that each 
of the subsets {a, b}, {a, c), and {b, c} is equally likely to be the sample. A random subset 
can be chosen sequentially by letting its first element be equally likely to be any of the A^ 
elements of the population, then letting its second element be equally likely to be any of 
the remaining A^ — 1 elements of the population, and so on. 

Suppose now that a random sample of size n has been chosen from a population of size 
N. For i = l,...,n, let 

. 1 if the rth member of the sample has the characteristic 
otherwise 
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Consider now the sum of the Xi\ that is, consider 

x = x l +x 1 + ---+x n 

Because the term Xi contributes 1 to the sum if the ith member of the sample has the 
characteristic and otherwise, it follows that X is equal to the number of members of the 
sample that possess the characteristic. In addition, the sample mean 

n 

X=Xln = Y^Xiln 

2=1 

is equal to the proportion of the members of the sample that possess the characteristic. 

Let us now consider the probabilities associated with the statistics X and X. To begin, 
note that since each of the N members of the population is equally likely to be the 2th 
member of the sample, it follows that 

Np 

PVQ = l} = f=P 

Also, 

P{X i = 0}=l-P{X i =l}=l-p 

That is, each Xj is equal to either 1 or with respective probabilities p and 1 — p. 

It should be noted that the random variables X\,X 2 , . . . ,X n axe not independent. For 
instance, since the second selection is equally likely to be any of the N members of the 
population, of which Np have the characteristic, it follows that the probability that the 
second selection has the characteristic is NpIN = p. That is, without any knowledge of 
the outcome of the first selection, 

P{X 2 =l}=p 

However, the conditional probability that X% = 1, given that the first selection has the 

characteristic, is 

Np- 1 
P{X 2 = lXi = 1 = — 

1 ' ; N - 1 

which is seen by noting that if the first selection has the characteristic, then the second 
selection is equally likely to be any of the remaining N —\ elements, of which Np — 1 have 
the characteristic. Similarly, the probability that the second selection has the characteristic 
given that the first one does not is 

Np 
P{X 2 =l\X l =Q}- 



N 



Thus, knowing whether or not the first element of the random sample has the character- 
istic changes the probability for the next element. However, when the population size TV 
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is large in relation to the sample size n, this change will be very slight. For instance, if 
N = 1,000,/) = .4, then 

399 

P{X 2 = \\X l = \} = — = .3994 

which is very close to the unconditional probability thatJ^ = 1; namely, 

P{X 2 = 1} = .4 

Similarly, the probability that the second element of the sample has the characteristic given 
that the first does not is 

400 

P{X 2 = l\X x = 0} = = .4004 

999 

which is again very close to .4. 

Indeed, it can be shown that when the population size N is large with respect to 
the sample size n, then X\,X 2 , . . . ,X n are approximately independent. Now if we think 
of each X t as representing the result of a trial that is a success ifX ; - equals 1 and a failure 
otherwise, it follows thatX = X)/=l % can be thought of as representing the total number 
of successes in n trials. Hence, if the Xj were independent, then X would be a binomial 
random variable with parameters n and p. In other words, when the population size N 
is large in relation to the sample size n, then the distribution of the number of members 
of the sample that possess the characteristic is approximately that of a binomial random 
variable with parameters n and/). 

REMARK 

Of course, X\% a hypergeometric random variable (Section 5.4); and so the preceding 
shows that a hypergeometric can be approximated by a binomial random variable when 
the number chosen is small in relation to the total number of elements. 

For the remainder of this text, we will suppose that the underlying 
population is large in relation to the sample size and we will take the 
distribution of Xxo be binomial. 

By using the formulas given in Section 5.1 for the mean and standard deviation of 
a binomial random variable, we see that 

E[X] = np and SD{X) = j np{\ - p) 

Since X, the proportion of the sample that has the characteristic, is equal to XI n, we see 
from the preceding that 

E[X]=E[XVn=p 
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and 



SD(X) = SD{X)ln = V>(1 -p)ln 

EXAMPLE 6.6a Suppose that 45 percent of the population favors a certain candidate in an 
upcoming election. If a random sample of size 200 is chosen, find 

(a) the expected value and standard deviation of the number of members of the 
sample that favor the candidate; 

(b) the probability that more than half the members of the sample favor the candidate. 

SOLUTION 

(a) The expected value and standard deviation of the proportion that favor the 
candidate are 

E[X] = 200(.45) = 90, SD(X) = 7200(.45)(1 - .45) = 7.0356 

(b) Since X is binomial with parameters 200 and .45, the text disk gives the solution 

P{X > 101} = .0681 

If this program were not available, then the normal approximation to the binomial 
(Section 6.3) could be used: 

P{X > 101} = P{X > 100.5} (the continuity correction) 

,X-90 100.5-90 
= P\ > 



7.0356 " 7.0356 
^P{Z > 1.4924} 
% .0678 

The solution obtained by the normal approximation is correct to 3 decimal 
places. ■ 

Even when each element of the population has more than two possible values, it still 
remains true that if the population size is large in relation to the sample size, then the 
sample data can be regarded as being independent random variables from the population 
distribution. 

EXAMPLE 6.6b According to the U.S. Department of Agriculture's World Livestock Situa- 
tion, the country with the greatest per capita consumption of pork is Denmark. In 1994, 
the amount of pork consumed by a person residing in Denmark had a mean value of 147 
pounds with a standard deviation of 62 pounds. If a random sample of 25 Danes is chosen, 
approximate the probability that the average amount of pork consumed by the members 
of this group in 1994 exceeded 150 pounds. 
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SOLUTION If we let Xi be the amount consumed by the z'th member of the sample, 
i — 1, . . . , 25, then the desired probability is 

\X x -\ hX 25 1 - 

P — > 150 = P{X > 150} 

where X is the sample mean of the 25 sample values. Since we can regard the X, as being 
independent random variables with mean 147 and standard deviation 62, it follows from 
the central limit theorem that their sample mean will be approximately normal with mean 
147 and standard deviation 62/5. Thus, with Z being a standard normal random variable, 
we have 

[X-147 150-147 

^V> lS()l_7^_-^>_- J 




Problems 

1. Plot the probability mass function of the sample mean ofXi, . . . ,X„, when 

(a) n = 2; 
(a) n = 3. 

Assume that the probability mass function of the X{ is 

P{X = 0} = .2, P{X = 1} = .3, P{X = 3} = .5 

In both cases, determine E[X] and Var(X). 

2. If 10 fair dice are rolled, approximate the probability that the sum of the values 
obtained (which ranges from 20 to 120) is between 30 and 40 inclusive. 

3. Approximate the probability that the sum of 16 independent uniform (0, 1) 
random variables exceeds 10. 

4. A roulette wheel has 38 slots, numbered 0, 00, and 1 through 36. If you bet 
1 on a specified number, you either win 35 if the roulette ball lands on that 
number or lose 1 if it does not. If you continually make such bets, approximate the 
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probability that 

(a) you are winning after 34 bets; 

(b) you are winning after 1,000 bets; 

(c) you are winning after 100,000 bets. 

Assume that each roll of the roulette ball is equally likely to land on any of the 
38 numbers. 

5. A highway department has enough salt to handle a total of 80 inches of snowfall. 
Suppose the daily amount of snow has a mean of 1 . 5 inches and a standard deviation 
of .3 inches. 

(a) Approximate the probability that the salt on hand will suffice for the next 
50 days. 

(b) What assumption did you make in solving part (a)? 

(c) Do you think this assumption is justified? Explain briefly. 

6. Fifty numbers are rounded off to the nearest integer and then summed. If the 
individual roundoff errors are uniformly distributed between — .5 and .5, what is 
the approximate probability that the resultant sum differs from the exact sum by 
more than 3? 

7. A six-sided die, in which each side is equally likely to appear, is repeatedly rolled 
until the total of all rolls exceeds 400. Approximate the probability that this will 
require more than 140 rolls. 

8. The amount of time that a certain type of battery functions is a random variable 
with mean 5 weeks and standard deviation 1.5 weeks. Upon failure, it is imme- 
diately replaced by a new battery. Approximate the probability that 13 or more 
batteries will be needed in a year. 

9. The lifetime of a certain electrical part is a random variable with mean 100 hours 
and standard deviation 20 hours. If 16 such parts are tested, find the probability 
that the sample mean is 

(a) less than 104; 

(b) between 98 and 104 hours. 

10. A tobacco company claims that the amount of nicotine in its cigarettes is a random 
variable with mean 2.2 mg and standard deviation .3 mg. However, the sample 
mean nicotine content of 100 randomly chosen cigarettes was 3. 1 mg. What is the 
approximate probability that the sample mean would have been as high or higher 
than 3.1 if the company's claims were true? 

11. The lifetime (in hours) of a type of electric bulb has expected value 500 and 
standard deviation 80. Approximate the probability that the sample mean of n 
such bulbs is greater than 525 when 

(a) n = 4; 

(b) n = 16; 
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(c) n = 36; 

(d) » = 64. 

12. An instructor knows from past experience that student exam scores have mean 
77 and standard deviation 15. At present the instructor is teaching two separate 
classes — one of size 25 and the other of size 64. 

(a) Approximate the probability that the average test score in the class of size 25 
lies between 72 and 82. 

(b) Repeat part (a) for a class of size 64. 

(c) What is the approximate probability that the average test score in the class of 
size 25 is higher than that of the class of size 64? 

(d) Suppose the average scores in the two classes are 76 and 83. Which class, the 
one of size 25 or the one of size 64, do you think was more likely to have 
averaged 83? 

13. If Xis binomial with parameters n = 150, p = .6, compute the exact value of 
P{X < 80} and compare with its normal approximation both (a) making use of 
and (b) not making use of the continuity correction. 

14. Each computer chip made in a certain plant will, independently, be defective 
with probability .25. If a sample of 1,000 chips is tested, what is the approximate 
probability that fewer than 200 chips will be defective? 

15. A club basketball team will play a 60-game season. Thirty- two of these games 
are against class A teams and 28 are against class B teams. The outcomes of 
all the games are independent. The team will win each game against a class 
A opponent with probability .5, and it will win each game against a class B 
opponent with probability .7. Let X denote its total number of victories in the 
season. 

(a) Is la binomial random variable? 

(b) Let Xa and Xb denote, respectively, the number of victories against class A 
and class B teams. What are the distributions of Xa andXg? 

(c) What is the relationship between Xa, Xb, and XI 

(d) Approximate the probability that the team wins 40 or more games. 

16. Argue, based on the central limit theorem, that a Poisson random variable having 
mean k will approximately have a normal distribution with mean and variance 
both equal to X when X is large. If X is Poisson with mean 100, compute the 
exact probability thatXis less than or equal to 1 16 and compare it with its normal 
approximation both when a continuity correction is utilized and when it is not. 
The convergence of the Poisson to the normal is indicated in Figure 6.5. 

17. Use the text disk to compute P{X < 10} when X is a binomial random variable 
with parameters n = 100,/) = .1. Now compare this with its (a) Poisson and 
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FIGURE 6.5 Poisson probability mass functions. 
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(b) normal approximation. In using the normal approximation, write the desired 
probability as P{X < 10.5} so as to utilize the continuity correction. 

18. The temperature at which a thermostat goes off is normally distributed with 
variance a . If the thermostat is to be tested five times, find 

(a) P{S 2 lo 2 < 1.8} 

(b) P{.85 < S 2 lo 2 < 1.15} 

where S is the sample variance of the five data values. 

19. In Problem 18, how large a sample would be necessary to ensure that the probability 
in part (a) is at least .95? 

20. Consider two independent samples — the first of size 1 from a normal population 
having variance 4 and the second of size 5 from a normal population having 
variance 2. Compute the probability that the sample variance from the second 
sample exceeds the one from the first. (Hint. Relate it to the ^-distribution.) 

21. Twelve percent of the population is left-handed. Find the probability that there 
are between 10 and 14 left-handers in a random sample of 100 members of this 
population. That is, find P{ 10 < X < 14}, where Xis the number of left-handers 
in the sample. 

22. Fifty-two percent of the residents of a certain city are in favor of teaching evolution 
in high school. Find or approximate the probability that at least 50 percent of a 
random sample of size n is in favor of teaching evolution, when 

(a) n = 10; 

(b) n = 100; 

(c) n = 1,000; 

(d) n = 10,000. 

23. The following table gives the percentages of individuals, categorized by gender, 
that follow certain negative health practices. Suppose a random sample of 300 men 
is chosen. Approximate the probability that 

(a) at least 1 50 of them rarely eat breakfast; 

(b) fewer than 100 of them smoke. 



Sleeps 6 Hours 
or Less per Night 



Smoker 



Rarely Eats 
Breakfast 



Is 20 Percent or 
More Overweight 



Men 
Women 



22.7 
21.4 



28.4 
22.8 



45.4 
42.0 



29.6 
25.6 



Source: U.S. National Center for Health Statistics, Health Promotion and Disease Prevention, 1990. 



24. (Use the table from Problem 23.) Suppose a random sample of 300 women is 
chosen. Approximate the probability that 
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(a) at least 60 of them are overweight by 20 percent or more; 

(b) fewer than 50 of them sleep 6 hours or less nightly. 

25. (Use the table from Problem 23.) Suppose random samples of 300 women and 
of 300 men are chosen. Approximate the probability that more women than men 
rarely eat breakfast. 

26. The following table uses 1989 data concerning the percentages of male and female 
full-time workers whose annual salaries fall in different salary groupings. Suppose 
random samples of 1,000 men and 1,000 women were chosen. Use the table to 
approximate the probability that 

(a) at least half of the women earned less than $20,000; 

(b) more than half of the men earned $20,000 or more; 

(c) more than half of the women and more than half of the men earned $20,000 
or more; 

(d) 250 or fewer of the women earned at least $25,000; 

(e) at least 200 of the men earned $50,000 or more; 

(f) more women than men earned between $20,000 and $24,999. 

Earnings Range Percentage of Women Percentage of Men 

1.8 
4.7 
23.1 
13.4 
42.1 
14.9 

Source: U.S. Department of Commerce, Bureau of the Census. 

27. In 1995 the percentage of the labor force that belonged to a union was 14.9. If 
five workers had been randomly chosen in that year, what is the probability that 
none of them would have belonged to a union? Compare your answer to what it 
would be for the year 1945, when an all time high of 35.5 percent of the labor 
force belonged to a union. 

28. The sample mean and sample standard deviation of all San Francisco student 
scores on the most recent Scholastic Aptitude Test examination in mathematics 
were 517 and 120. Approximate the probability that a random sample of 144 
students would have an average score exceeding 

(a) 507; 

(b) 517; 

(c) 537; 

(d) 550. 



$4,999 or less 


2.8 


$5,000 to $9,999 


10.4 


$10,000 to $19,999 


41.0 


$20,000 to $25,000 


16.5 


$25,000 to $49,999 


26.3 


$50,000 and over 


3.0 
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29. The average salary of newly graduated students with bachelor's degrees in chemical 
engineering is $43,600, with a standard deviation of $3,200. Approximate the 
probability that the average salary of a sample of 12 recently graduated chemical 
engineers exceeds $45,000. 

30. A certain component is critical to the operation of an electrical system and must be 
replaced immediately upon failure. If the mean lifetime of this type of component 
is 100 hours and its standard deviation is 30 hours, how many of the components 
must be in stock so that the probability that the system is in continual operation 
for the next 2000 hours is at least .95? 
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PARAMETER ESTIMATION 



7.1 INTRODUCTION 

Let X\, . . . ,X n be a random sample from a distribution Fg that is specified up to a vector 
of unknown parameters 0. For instance, the sample could be from a Poisson distribution 
whose mean value is unknown; or it could be from a normal distribution having an 
unknown mean and variance. Whereas in probability theory it is usual to suppose that 
all of the parameters of a distribution are known, the opposite is true in statistics, where 
a central problem is to use the observed data to make inferences about the unknown 
parameters. 

In Section 7.2, we present the maximum likelihood method for determining estimators 
of unknown parameters. The estimates so obtained are called point estimates, because they 
specify a single quantity as an estimate of 9. In Section 7.3, we consider the problem 
of obtaining interval estimates. In this case, rather than specifying a certain value as our 
estimate of 6, we specify an interval in which we estimate that 9 lies. Additionally, we 
consider the question of how much confidence we can attach to such an interval estimate. 
We illustrate by showing how to obtain an interval estimate of the unknown mean of 
a normal distribution whose variance is specified. We then consider a variety of interval 
estimation problems. In Section 7.3.1, we present an interval estimate of the mean of a 
normal distribution whose variance is unknown. In Section 7.3.2, we obtain an interval 
estimate of the variance of a normal distribution. In Section 7.4, we determine an interval 
estimate for the difference of two normal means, both when their variances are assumed to 
be known and when they are assumed to be unknown (although in the latter case we suppose 
that the unknown variances are equal). In Sections 7.5 and the optional Section 7.6, we 
present interval estimates of the mean of a Bernoulli random variable and the mean of an 
exponential random variable. 

In the optional Section 7.7, we return to the general problem of obtaining point esti- 
mates of unknown parameters and show how to evaluate an estimator by considering its 
mean square error. The bias of an estimator is discussed, and its relationship to the mean 
square error is explored. 
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In the optional Section 7.8, we consider the problem of determining an estimate of an 
unknown parameter when there is some prior information available. This is the Bayesian 
approach, which supposes that prior to observing the data, information about 9 is always 
available to the decision maker, and that this information can be expressed in terms of 
a probability distribution on 9 . In such a situation, we show how to compute the Bayes 
estimator, which is the estimator whose expected squared distance from 9 is minimal. 

7.2 MAXIMUM LIKELIHOOD ESTIMATORS 

Any statistic used to estimate the value of an unknown parameter 9 is called an estimator 
of 9. The observed value of the estimator is called the estimate. For instance, as we shall 
see, the usual estimator of the mean of a normal population, based on a sample X\ , . . . , X„ 
from that population, is the sample mean X = ^2-X;/n. If a sample of size 3 yields the 
dataXi = 2, Xi = 3, X$ = 4, then the estimate of the population mean, resulting from 
the estimator X, is the value 3. 

Suppose that the random variables X\,. . . ,X„, whose joint distribution is assumed 
given except for an unknown parameter 9, are to be observed. The problem of interest 
is to use the observed values to estimate 9. For example, the Xfs might be independent, 
exponential random variables each having the same unknown mean 9. In this case, the 
joint density function of the random variables would be given by 

j \^vL> "2> • • ■ ' *h) 

= fXiixi)fx 2 (x2) ■ ■ -fx n {x n ) 

= l-*i/e I,-*/* . . . !,-*» o < x, < oo, i = 1, . . . , n 



= — exp | - ^ x iM | > < x; < oc,i = 1, 



,n 



and the objective would be to estimate 9 from the observed dataX\,X2, . . . , X n . 

A particular type of estimator, known as the maximum likelihood estimator, is widely 
used in statistics. It is obtained by reasoning as follows. Let/Xxi, . . . ,x n \9) denote the joint 
probability mass function of the random variables X\ ,Xi, . . . , X n when they are discrete, 
and let it be their joint probability density function when they are jointly continuous 
random variables. Because 9 is assumed unknown, we also write/ as a function of 9. 
Now since /(xi, . . . , x n \9) represents the likelihood that the values x\,Xi,...,x n will be 
observed when 9 is the true value of the parameter, it would seem that a reasonable estimate 
of 9 would be that value yielding the largest likelihood of the observed values. In other 
words, the maximum likelihood estimate 9 is defined to be that value of 9 maximizing 
f{x\, . . . ,x n \9) where x\, . . . ,x n are the observed values. The function f{x\, . . . ,x n \9) is 
often referred to as the likelihood function of 9 . 
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In determining the maximizing value of 9, it is often useful to use the fact that 
f(x\, . . . ,x„\9) and log[/"(xi, . . . ,x»|0)] have their maximum at the same value of 9. 
Hence, we may also obtain 9 by maximizing log[/"(xi, . . . , #«|0)]. 

EXAMPLE 7.2a Maximum Likelihood Estimator of a Bernoulli Parameter Suppose that n inde- 
pendent trials, each of which is a success with probability/*, are performed. What is the 
maximum likelihood estimator of />? 

SOLUTION The data consist of the values of X\, . . . ,X„ where 

II if trial i is a success 
otherwise 

Now 

P{X=l}=p=l-P{X = 0} 

which can be succinctly expressed as 

P{Xi = x}=f(\-p) x -*, x = 0, 1 

Hence, by the assumed independence of the trials, the likelihood (that is, the joint 
probability mass function) of the data is given by 

f(x\, . ..,x„\p)= P{Xi =x\,...,X n =x n \p) 

= f l 0.-p) l ~* 1 ■■■p Xn {\ -p) l ~ x " 

= p L " x '{l-p) n - Y " x ', x, = 0,l, i=l,...,n 

To determine the value of p that maximizes the likelihood, first take logs to obtain 

n I n \ 

log/(xi,... ,x n \p) — y^^logj>+ in- y^x/jlog(l -p) 
Differentiation yields 

d ^^ ( n ~P') 

— log f(x\, . . . , x„\p) = 

dp p 1 — p 
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Upon equating to zero and solving, we obtain that the maximum likelihood estimate p 
satisfies 



J2 Xi n — J2 Xi 
l l 



P 



\-p 



E*« 



Hence, the maximum likelihood estimator of the unknown mean of a Bernoulli 
distribution is given by 

n 

E-S 

d{X\ ,... , X„) = 

n 

Since E»=i-^» ' s tne number of successful trials, we see that the maximum likelihood 
estimator of/> is equal to the proportion of the observed trials that result in successes. 
For an illustration, suppose that each RAM (random access memory) chip produced by 
a certain manufacturer is, independently, of acceptable quality with probability/*. Then 
if out of a sample of 1 ,000 tested 92 1 are acceptable, it follows that the maximum likelihood 
estimate of p is .921. ■ 

EXAMPLE 7.2b Two proofreaders were given the same manuscript to read. If proofreader 
1 found n\ errors, and proofreader 2 found #2 errors, with «x,2 of these errors being found 
by both proofreaders, estimate N, the total number of errors that are in the manuscript. 

SOLUTION Before we can estimate N we need to make some assumptions about the 
underlying probability model. So let us assume that the results of the proofreaders are 
independent, and that each error in the manuscript is independently found by proofreader 
iwith probability pi, i = 1,2. 

To estimate N, we will start by deriving an estimator of/»i. To do so, note that each 
of the «2 errors found by reader 2 will, independently, be found by proofreader 1 with 
probability pi. Because proofreader 1 found n\£ of those »2 errors, a reasonable estimate 
of^i is given by 

«1,2 

p\ = 

»2 
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However, because proofreader 1 found n\ of the TVerrors in the manuscript, it is reasonable 
to suppose that/>i is also approximately equal to %. Equating this to/>i gives that 



»u 


f%s 


n\ 


»2 




N 



«1»2 

7V ~ 

«1,2 

Because the preceding estimate is symmetric in n\ and «2> it follows that it is the same 
no matter which proofreader is designated as proofreader 1 . 

An interesting application of the preceding occurred when two teams of researchers 
recently announced that they had decoded the human genetic code sequence. As part 
of their work both teams estimated that the human genome consisted of approximately 
33,000 genes. Because both teams independently arrived at the same number, many 
scientists found this number believable. However, most scientists were quite surprised by 
this relatively small number of genes; by comparison it is only about twice as many as a 
fruit fly has. However, a closer inspection of the findings indicated that the two groups 
only agreed on the existence of about 17,000 genes. (That is, 17,000 genes were found by 
both teams.) Thus, based on our preceding estimator, we would estimate that the actual 
number of genes, rather than being 33,000, is 

n\n% 33,000x33,000 

= % 64,000 

» u 17,000 

(Because there is some controversy about whether some of genes claimed to be found are 
actually genes, 64,000 should probably be taken as an upper bound on the actual number 
of genes.) 

The estimation approach used when there are two proofreaders does not work when 
there are m proofreaders, when m > 2. For, if for each i, we let pi be the fraction of the 
errors found by at least one of the other proofreaders j, (j j^ i), that are also found by i, 
and then set that equal to j?, then the estimate of A', namely ?S would differ for different 

values of i. Moreover, with this approach it is possible that we may have that pi > pj 
even if proofreader i finds fewer errors than does proofreader j. For instance, for m = 3, 
suppose proofreaders 1 and 2 find exactly the same set of 10 errors whereas proofreader 3 
finds 20 errors with only 1 of them in common with the set of errors found by the others. 
Then, because proofreader 1 (and 2) found 10 of the 29 errors found by at least one of the 
other proofreaders, p; = 10/29, i = 1,2. On the other hand, because proofreader 3 only 
found 1 of the 10 errors found by the others, p$ = 1/10. Therefore, although proofreader 
3 found twice the number of errors as did proofreader 1, the estimate of ^3 is less than 
that of/>i. To obtain more reasonable estimates, we could take the preceding values of 
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pi, i = \,...,m, as preliminary estimates of the pi. Now, let nf be the number of errors 
that are found by at least one proofreader. Because nfIN is the fraction of errors that are 
found by at least one proofreader, this should approximately equal 1 — i~[<=i(l — pi)> ^ a& 
probability that an error is found by at least one proofreader. Therefore, we have 



nf 
N 

suggesting that N ^ N, where 



I -fl(l -Pi) 



nf 

N = (7 2 1) 

1 - 1 l z =i CI -pi) 

With this estimate of N, we can then reset our estimates of the pi by using 

pi— — , i—l,...,m (7.2.2) 

N 

We can then reestimate N by using the new value (7.2.1). (The estimation need not stop 
here; each time we obtain a new estimate N of TVwe can use (7.2.2) to obtain new estimates 
of the pi, which can then be used to obtain a new estimate of jV, and so on.) ■ 

EXAMPLE 7.2c Maximum Likelihood Estimator of a Poisson Parameter Suppose X\, . . . ,X n 
are independent Poisson random variables each having mean A. Determine the maxi- 
mum likelihood estimator of A.. 

SOLUTION The likelihood function is given by 

e -X XXl e -X X x„ 
f{x\,...,x n \X) = 



x\\ 
e -nX^f Xi 

Vi ' v' 

A, I . . . . Ayi . 



Thus, 



log /(xi,. . . ,x n \X) = -rik + 2jx;logA. - logc 

l 

where c = ]~[" =1 Xi\ does not depend on X, and 



Ci \ 

log/(xi,. . . ,x n \X) = —n + 



dk OJ —■■'"'"-' • x 
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By equating to zero, we obtain that the maximum likelihood estimate X equals 



x = ^- 



and so the maximum likelihood estimator is given by 

n 



d{X\, . . . ,X„) = 



For example, suppose that the number of people that enter a certain retail establishment 
in any day is a Poisson random variable having an unknown mean X, which must be 
estimated. If after 20 days a total of 857 people have entered the establishment, then 
the maximum likelihood estimate of X is 857/20 = 42.85. That is, we estimate that 
on average, 42.85 customers will enter the establishment on a given day. ■ 

EXAMPLE 7.2d The number of traffic accidents in Berkeley, California, in 10 randomly 
chosen nonrainy days in 1998 is as follows: 

4,0,6,5,2,1,2,0,4,3 

Use these data to estimate the proportion of nonrainy days that had 2 or fewer accidents 
that year. 

SOLUTION Since there are a large number of drivers, each of whom has a small probability 
of being involved in an accident in a given day, it seems reasonable to assume that the daily 
number of traffic accidents is a Poisson random variable. Since 



- 1 ^ 



it follows that the maximum likelihood estimate of the Poisson mean is 2.7. Since the 
long-run proportion of nonrainy days that have 2 or fewer accidents is equal to P{X < 2}, 
where X is the random number of accidents in a day, it follows that the desired estimate is 

e~ 2J (l + 2.7 + (2.7) 2 /2) = .4936 

That is, we estimate that a little less than half of the nonrainy days had 2 or fewer 
accidents. ■ 
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EXAMPLE 7.2e Maximum Likelihood Estimator in a Normal Population Suppose X\, . . . ,X n 
are independent, normal random variables each with unknown mean /x and unknown 
standard deviation o . The joint density is given by 



f(xi,...,x„\n,o) 



1 



»=i 



lira 



exp 



-{xj - n) 2 
2ct 2 



1 V 12 1 



27T / CT" 



exp 



£(x, - /x) z 



2o* 



The logarithm of the likelihood is thus given by 



log/(xi,. . . ,x k |/z,<t) = --log(27r) - /zlogcr 



2ct 2 



In order to find the value of \jl and a maximizing the foregoing, we compute 



J2 ( x i ~ M) 



d 
da~ 



log/(xi,... ,x„\(i,a) 



n i 
log/(xi,. . . ,x„\ix,a) = h 



E<*i - /x) 2 



a a- 



Equating these equations to zero yields that 



jU = } Xjln 



and 



1/2 



/M ~ frf/r. 



L*=l 
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Hence, the maximum likelihood estimators of ji and a are given, respectively, by 



X and 



j^Ob-X) 2 /,. 



1/2 



(7.2.3) 



It should be noted that the maximum likelihood estimator of the standard deviation a 
differs from the sample standard deviation 



YjiX.-XfKn-l) 



ill 



in that the denominator in Equation 7.2.3 is *Jn rather than -Jn — 1. However, for n of 
reasonable size, these two estimators of a will be approximately equal. ■ 

EXAMPLE 7.2f Kolmogorov s law of fragmentation states that the size of an individual particle 
in a large collection of particles resulting from the fragmentation of a mineral compound 
will have an approximate lognormal distribution, where a random variable X is said to 
have a lognormal distribution if log(X) has a normal distribution. The law, which was 
first noted empirically and then later given a theoretical basis by Kolmogorov, has been 
applied to a variety of engineering studies. For instance, it has been used in the analysis of 
the size of randomly chosen gold particles from a collection of gold sand. A less obvious 
application of the law has been to a study of the stress release in earthquake fault zones 
(see Lomnitz, C, "Global Tectonics and Earthquake Risk," Developments in Geotectonics, 
Elsevier, Amsterdam, 1979). 

Suppose that a sample of 10 grains of metallic sand taken from a large sand pile have 
respective lengths (in millimeters): 

2.2, 3.4, 1.6, 0.8, 2.7, 3.3, 1.6, 2.8, 2.5, 1.9 

Estimate the percentage of sand grains in the entire pile whose length is between 2 and 3 
mm. 

SOLUTION Taking the natural logarithm of these 10 data values, the following transformed 
data set results 

.7885, 1.2238, .4700, -.2231, .9933, 1.1939, .4700, 1.0296, .9163, .6419 



Because the sample mean and sample standard deviation of these data are 

x = .7504, s = .4351 
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it follows that the logarithm of the length of a randomly chosen grain has a normal 
distribution with mean approximately equal to .7504 and with standard deviation approxi- 
mately equal to .4351. Hence, i(X is the length of the grain, then 

P{2 < X < 3} = P{log(2) < logtff) < log(3)} 

_ f log(2) - .7504 logCjQ - .7504 log(3) - .7504 1 

1 .4351 < .4351 < A551 J 

f log(X) - .7504 

= P -.1316 < -^ < .8003 

I .4351 

« $(.8003)- 0(-.1316) 
= .3405 ■ 

In all of the foregoing examples, the maximum likelihood estimator of the population 
mean turned out to be the sample mean X. To show that this is not always the situation, 
consider the following example. 

EXAMPLE 7.2g Estimating the Mean of a Uniform Distribution Suppose X\, . . . , X n consti- 
tute a sample from a uniform distribution on (0, 9), where 9 is unknown. Their joint 
density is thus 



f(x\,X2,. ■ ■ ,X„\0) = 



1 

— < x; < 9, i = 1, . . . , n 

9" 

otherwise 



This density is maximized by choosing 9 as small as possible. Since 6 must be at least as 
large as all of the observed values X{, it follows that the smallest possible choice of 9 is equal 
to max(xi, X2, ...,%). Hence, the maximum likelihood estimator of 9 is 

9 = msK(Xi,X 2 ,. . . ,X n ) 

It easily follows from the foregoing that the maximum likelihood estimator of 9/2, the 
mean of the distribution, is max(Xi, Xi, ■ ■ ■ ,X n )/2. ■ 

*7.2. 1 Estimating Life Distributions 

Let X denote the age at death of a randomly chosen child born today. That is, X — i if 
the newborn dies in its 2th year, i > 1. To estimate the probability mass function of X, 
let Xi denote the probability that a newborn who has survived his or her first i — 1 years 



* Optional section. 



7.2 Maximum Likelihood Estimators 239 

dies in year i. That is, 

P{X = i] 



Xi = P{X = i\X >i-l] 



P{X> i- 1} 



Also, let 



P{X > i] 

Si=l - \j 



P{X > i- 1) 



be the probability that a newborn who survives her first i — 1 years also survives year i. 
The quantity kj is called the failure rate, and sf is called the survival rate, of an individual 
who is entering his or her z'th year. Now, 

P{X>2}P{X>5] P{X>i} 

sis 2 - --si =P{X > V 



P{X > \}P{X > 2} P{X > i - 1} 
= P{X > i\ 

Therefore, 

P{X = n) = P{X > n- \}X n =s x •••%_i(l - s„) 

Consequently, we can estimate the probability mass function of X by estimating the 
quantities S{, i = 1, . . . , n. The value Si can be estimated by looking at all individuals 
in the population who reached age i one year ago, and then letting the estimate % be 
the fraction of them who are alive today. We would then use ?i?2 • • • ?»-i (l — ?m) as the 
estimate of P{X = n). (Note that although we are using the most recent possible data to 
estimate the quantities s{, our estimate of the probability mass function of the lifetime of 
a newborn assumes that the survival rate of the newborn when it reaches age i will be the 
same as last year's survival rate of someone of age i.) 

The use of the survival rate to estimate a life distribution is also of importance in health 
studies with partial information. For instance, consider a study in which a new drug is 
given to a random sample of 12 lung cancer patients. Suppose that after some time we 
have the following data on the number of months of survival after starting the new drug: 

4, 7*, 9, 11*, 12, 3, 14*, 1,8, 7, 5, 3* 

where x means that the patient died in month x after starting the drug treatment, and x* 
means that the patient has taken the drug for x months and is still alive. 

Let X equal the number of months of survival after beginning the drug treatment, and 
let 

P{X > i] 

s,=P{X > i\X > i-l} = 



P{X > i - 1} 
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To estimate s{, the probability that a patient who has survived the first i — 1 months will 
also survive month i, we should take the fraction of those patients who began their ith 
month of drug taking and survived the month. For instance, because 1 1 of the 12 patients 
survived month 1, }\ = 11/12. Because all 11 patients who began month 2 survived, 
?2 = 11/11. Because 10 of the 11 patients who began month 3 survived, % = 10/11. 
Because 8 of the 9 patients who began their fourth month of taking the drug (all but the 
ones labelled 1, 3, and 3*) survived month 4, 54 = 8/9. Similar reasoning holds for the 
others, giving the following survival rate estimates: 

si = 11/12 

5 2 = 11/11 

5 3 = 10/11 

s 4 = 8/9 

h = 7/8 

s 6 = 717 

h = 6/7 

h = 4/5 

h = 3/4 
J10 = 3/3 
Jn =3/3 
?12 = 1/2 
J13 = 1/1 
s u = 111 

We can now use Y\ i= i H t0 estimate the probability that a drug taker survives at least j 
time periods, j = 1, . . . , 14. For instance, our estimate of P{X > 6} is 35/54. 



7.3 INTERVAL ESTIMATES 

Suppose that X\ , . . . , X„ is a sample from a normal population having unknown mean [i 
and known variance a . It has been shown thatX = y^"_, Xjln is the maximum likelihood 
estimator for /x. However, we don't expect that the sample mean X will exactly equal fi, 
but rather that it will "be close." Hence, rather than a point estimate, it is sometimes more 
valuable to be able to specify an interval for which we have a certain degree of confidence 
that /U. lies within. To obtain such an interval estimator, we make use of the probability 
distribution of the point estimator. Let us see how it works for the preceding situation. 
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In the foregoing, since the point estimator X is normal with mean /i and variance o In, 
it follows that 

x-n Ux-n) 

——— = v» 

al^Jn a 

has a standard normal distribution. Therefore, 



P\ 



-1.96 < -Jn- — < 1.96 | = .95 



or, equivalently, 



P I —1.96-^= < X- \l < 1.96-^= [ = -95 



Multiplying through by — 1 yields the equivalent statement 



P \ -1.96— < fi-X < 1.96 



■Jn 



= .95 



or, equivalently, 



P\X - 1.96-^= < M < -X" + 1.96-^= = .95 

\ sfn y/n\ 

That is, 95 percent of the time /x will lie within l.96a l^/n units of the sample average. If 
we now observe the sample and it turns out thatX = x, then we say that "with 95 percent 
confidence" 



a a 

x — 1.96—= < /x < x + 1.96—= 



(7.3.1) 



That is, "with 95 percent confidence" we assert that the true mean lies within 1. 96<7 1-^/n 
of the observed sample mean. The interval 



x - 1.96— =,x+ 1 
Jn 



VnJ 



is called a 95 percent confidence interval estimate of /u,. 

EXAMPLE 7.3a Suppose that when a signal having value fi is transmitted from location A 
the value received at location B is normally distributed with mean fi and variance 4. That 
is, if /x is sent, then the value received is fi + N where N, representing noise, is normal 
with mean and variance 4. To reduce error, suppose the same value is sent 9 times. If 
the successive values received are 5, 8.5, 12, 15, 7, 9, 7.5, 6.5, 10.5, let us construct a 
95 percent confidence interval for fi. 
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Since 

81 

x = — = 9 

9 

It follows, under the assumption that the values received are independent, that a 95 percent 
confidence interval for jjl is 

9- 1.96— ,9+ 1.96— ) - (7.69, 10.31) 

Hence, we are "95 percent confident" that the true message value lies between 7.69 and 
10.31. ■ 

The interval in Equation 7.3.1 is called a two-sided confidence interval. Sometimes, 
however, we are interested in determining a value so that we can assert with, say, 95 
percent confidence, that /x is at least as large as that value. 

To determine such a value, note that if Z is a standard normal random variable then 

P[Z < 1.645} = .95 



As a result, 



or 



Jy^^ <L6 4 5 = .95 



P\X- 1.645-^= < n\ =.95 

Jn 



Thus, a 95 percent one-sided upper confidence interval for /x is 



x — 1.645— 7=, oo 
Jn 



where x is the observed value of the sample mean. 

A one-sided lower confidence interval is obtained similarly; when the observed value of 
the sample mean is x, then the 95 percent one-sided lower confidence interval for jjl is 



a 
-oo,x + 1.645— = 
Jn 



EXAMPLE 7.3b Determine the upper and lower 95 percent confidence interval estimates 
of /x in Example 7.3a. 
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SOLUTION Since 

,, v 3.29 
1.645-= = = 1.097 

V« 3 

the 95 percent upper confidence interval is 

(9- 1.097, oo) = (7.903,oo) 

and the 95 percent lower confidence interval is 

(-oo,9 + 1.097) = (-oo,10.097) ■ 

We can also obtain confidence intervals of any specified level of confidence. To do so, 
recall that z a is such that 

P{Z > z a } = a 

when Z is a standard normal random variable. But this implies (see Figure 7.1) that for 
any a 



P{-Z a /2 <Z < Z a n} = 1 - a 



As a result, we see that 



n (X-/x) 

P \ —Za/2 < V« < Z a /2 \ — 1 - a 



a 



or 



a — a 

P \ -Z a / 2 —^= < X - \l < Z a n—= \ 



That is 



a — a 

P \ -Zall—j= < /X-X < Za/2—pz \ = -CI 

Jn Jn 



a — a 

P \X - Z a l2 —=< /J, <X + Z a /2 —j= \ = ~ CC 

Jn Jn 




FIGURE 7.1 P{-z a i2 <Z < z a/2 } = 1 - a. 
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Hence, a 100(1 — a) percent two-sided confidence interval for ix is 

_ ° a 

X - Z a l%—=, X + Z a l2 —j= 

■s/n y/n 

where x is the observed sample mean. 

Similarly, knowing that Z = ^pn a is a standard normal random variable, along 
with the identities 

P{Z > z a ] = a 

and 

P{Z < —z a ] = a 

results in one-sided confidence intervals of any desired level of confidence. Specifically, we 
obtain that 

and 

a 

-oo,x + z a —= 
s/n 

are, respectively, 100(1 — a) percent one-sided upper and 100(1 — a) percent one-sided 
lower confidence intervals for /x. 

EXAMPLE 7.3c Use the data of Example 7.3a to obtain a 99 percent confidence interval 
estimate of fi, along with 99 percent one-sided upper and lower intervals. 

SOLUTION Since £.005 = 2.58, and 

a 5.16 

2.58— = = 1.72 

V" 3 

it follows that a 99 percent confidence interval for \x is 

9± 1.72 

That is, the 99 percent confidence interval estimate is (7.28, 10.72). 
Also, since £.01 = 2.33, a 99 percent upper confidence interval is 

(9 - 2.33(2/3), 00) = (7.447, 00) 

Similarly, a 99 percent lower confidence interval is 

(-00, 9 + 2.33(2/3)) = (-00, 10.553) ■ 
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Sometimes we are interested in a two-sided confidence interval of a certain level, say 
I — a, and the problem is to choose the sample size n so that the interval is of a certain 
size. For instance, suppose that we want to compute an interval of length . 1 that we can 
assert, with 99 percent confidence, contains fi. How large need n be? To solve this, note 
that as z.oo5 = 2.58 it follows that the 99 percent confidence interval for fi from a sample 
of size n is 

a a 

x-2.58 — , x + 2.58 — 
Jn Jn 



Hence, its length is 



, a 
5.16— 

Jn 



Thus, to make the length of the interval equal to .1, we must choose 



a 
5.16— 

Jn 



or 

« = (51.6ct) 2 



REMARK 

The interpretation of "a 100(1 —a) percent confidence interval" can be confusing. It 
should be noted that we are not asserting that the probability that fi € (x — 1. 96a/ jH, x + 
1. 96cr / \pn) is .95, for there are no random variables involved in this assertion. What we 
are asserting is that the technique utilized to obtain this interval is such that 95 percent of 
the time that it is employed it will result in an interval in which \x lies. In other words, 
before the data are observed we can assert that with probability .95 the interval that will 
be obtained will contain //, whereas after the data are obtained we can only assert that 
the resultant interval indeed contains /x "with confidence .95." 

EXAMPLE 7.3d From past experience it is known that the weights of salmon grown at 
a commercial hatchery are normal with a mean that varies from season to season but with 
a standard deviation that remains fixed at 0.3 pounds. If we want to be 95 percent certain 
that our estimate of the present season's mean weight of a salmon is correct to within 
±0. 1 pounds, how large a sample is needed? 

SOLUTION A 95 percent confidence interval estimate for the unknown mean \x, based on 
a sample of size n, is 

fi € (x- 1.96— =,x+ 1.96— 
\ ~Jn Jn 
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Because the estimate x is within l.96{a/^/n) = .588/^/w of any point in the interval, it 
follows that we can be 95 percent certain that x is within 0.1 of fi provided that 

.588 

<0.1 



That is, provided that 

yfn > 5.88 
or 

n > 34.57 
That is, a sample size of 35 or larger will suffice. ■ 

7.3.1 Confidence Interval for a Normal Mean When the Variance 
Is Unknown 

Suppose now that X\ , . . . , X n is a sample from a normal distribution with unknown mean 
fi and unknown variance a , and that we wish to construct a 1 00 ( 1 — a) percent confidence 
interval for fi. Since o is unknown, we can no longer base our interval on the fact that 
*Jn{X — \jl)Io is a standard normal random variable. However, bylettingS = X^=if^' — 
X) l{n — 1) denote the sample variance, then from Corollary 6.5.2 it follows that 

(J-/x) 

is a ^-random variable with n — 1 degrees of freedom. Hence, from the symmetry of the 
^-density function (see Figure 7.2), we have that for any a e (0, 1/2), 

f AX-n) | 

P { -tal2,n-\ < vn ^ < t a l2,n-l i = I - a 

or, equivalently, 

f- S - S } 

P \X- t a/2 ,n-l—j= < M <X + tct/2,„-l—j=\ = 1 -Ot 

Thus, if it is observed that X = x and S = s, then we can say that "with 100(1 — a) 
percent confidence" 

(- 

IX £ \X — t a l2,n-l—J=,X + t a l2,n-l—J= 
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Area = a/2 



Area = a/2 



P {- t al2,^ <T ^ <t al2,n-^=^- a 



FIGURE 7.2 t -density function. 



EXAMPLE 7.3e Let us again consider Example 7.3a but let us now suppose that when the 
value /x is transmitted at location A then the value received at location B is normal with 
mean /x and variance a but with a being unknown. If 9 successive values are, as in 
Example 7.3a, 5, 8.5, 12, 15, 7, 9, 7.5, 6.5, 10.5, compute a 95 percent confidence 
interval for fi. 

SOLUTION A simple calculation yields that 



and 



;2 _ I> ; 2 - 9(*) 2 _ 9 _ 5 



s = 3.082 
Hence, as £.025,8 = 2.306, a 95 percent confidence interval for fx is 



,(3.082) ,(3.082) 
9 - 2.306- -, 9 + 2.306- 



= (6.63,11.37) 



a larger interval than obtained in Example 7.3a. The reason why the interval just obtained 
is larger than the one in Example 7.3a is twofold. The primary reason is that we have 
a larger estimated variance than in Example 7.3a. That is, in Example 7.3a we assumed 
that a was known to equal 4, whereas in this example we assumed it to be unknown 
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and our estimate of it turned out to be 9.5, which resulted in a larger confidence interval. 
In fact, the confidence interval would have been larger than in Example 7.3a even if our 
estimate of a was again 4 because by having to estimate the variance we need to utilize 
the ^-distribution, which has a greater variance and thus a larger spread than the standard 
normal (which can be used when a is assumed known). For instance, if it had turned out 
that x = 9 and s = 4, then our confidence interval would have been 

(9 - 2.306 • §, 9 + 2.306 • f) = (7.46, 10.54) 

which is larger than that obtained in Example 7.3a. ■ 

REMARKS 

(a) The confidence interval for /x when a is known is based on the fact that ~Jn{X — 
[i)l a has a standard normal distribution. When a is unknown, the foregoing 
approach is to estimate it by S and then use the fact that spn{X — fi)/S has 
a ^-distribution with n — 1 degrees of freedom. 

(b) The length of a 100(1 — a) percent confidence interval for [i is not always larger 
when the variance is unknown. For the length of such an interval is 2z a o I ^fn when 
o is known, whereas it is 2t a , n -\Sl \fn when a is unknown; and it is certainly 
possible that the sample standard deviation S can turn out to be much smaller 
than o . However, it can be shown that the mean length of the interval is longer 
when a is unknown. That is, it can be shown that 

ta,n-lE[S] > Z a O 

Indeed, E[S] is evaluated in Chapter 14 and it is shown, for instance, that 

|.94cr when n = 5 
.97cr when n = 9 

Since 

-S.025 = 1-96, ^.025,4 = 2.78, £.025,8 = 2.31 

the length of a 95 percent confidence interval from a sample of size 5 is 
2 x 1.96er/V5 = 1.75<r when a is known, whereas its expected length is 
2 x 2.78 x .94cr/V5 = 2.34ct when a is unknown — an increase of 33.7 percent. 
If the sample is of size 9, then the two values to compare are 1.31cr and 1.49cr — a 
gain of 13.7 percent. ■ 

A one-sided upper confidence interval can be obtained by noting that 
J r {X-n) \ 

P \ V» 7, < ta,n-\ \ = 1 - 01 



7.3 Interval Estimates 249 



P\X- n < —=t a>n -i\ - 1 



or 



P\ll> X -j=t a ,n-l \ - 1 - « 

Hence, if it is observed thatX = x, S = s, then we can assert "with 100(1 — a) percent 
confidence" that 

[I £ ( X —ta t „-l,00 

Similarly, a 100(1 — a) lower confidence interval would be 

/x e I -oo,x+ —=t a! „-i 

Program 7.3.1 will compute both one- and two-sided confidence intervals for the mean 
of a normal distribution when the variance is unknown. 

EXAMPLE 7.3f Determine a 95 percent confidence interval for the average resting pulse 
of the members of a health club if a random selection of 1 5 members of the club yielded 
the data 54, 63, 58, 72, 49, 92, 70, 73, 69, 104, 48, 66, 80, 64, 77. Also determine 
a 95 percent lower confidence interval for this mean. 

SOLUTION We use Program 7.3.1 to obtain the solution (see Figure 7.3). I 

Our derivations of the 100(1 — a) percent confidence intervals for the population mean 
/u, have assumed that the population distribution is normal. However, even when this is 
not the case, if the sample size is reasonably large then the intervals obtained will still 
be approximate 100(1 — a) percent confidence intervals for fi. This is true because, by 
the central limit theorem, *Jn(X — ij)la will have approximately a normal distribution, 
and ~Jli{X — \x)IS will have approximately a ^--distribution. 

EXAMPLE 7.3g Simulation provides a powerful method for evaluating single and multi- 
dimensional integrals. For instance, let/ be a function of an r-valued vector (y\, . . . ,y r ), 
and suppose that we want to estimate the quantity 6, defined by 

® = / / '"/ f(y\>y2,---,yr)dy\dy2,.--,dy r 
Jo Jo Jo 

To accomplish this, note that if U\, Uj,. ■ . , U r are independent uniform random 
variables on (0, 1), then 

e = E[f(u u u 2 ,...,u r )] 
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Confidence Interval: Unknown Variance 



Sample size = 15 



Data Values 



Data value * 



77 



Add This Point To List 



Remove Selected Point From List 



54 


* 


63 




58 




72 




49 
92 
70 


* 


Clear List 



Start 



Quit 



Enter the value of a: 
(0<a<1) 



.05 



O One-Sided 
® Two-Sided 



® Upper 
O Lower 



The 95% confidence interval for the mean is (60.865, 77.6683) 



(a) 



Confidence Interval: Unknown Variance 



Sample size = 15 



Data Values 



Data value ■■ 



77 



Add This Point To List 



Remove Selected Point From List 



54 


* 


63 




58 




72 




49 




92 




70 


* 


Clear List 



Start 



Quit 



Enter the value of a: 
(0<a<1) 



.05 



® One-Sided 
O Two-Sided 



O Upper 
® Lower 



The 95% lower confidence interval for the mean is (-infinity, 76.1662) 



(b) 



FIGURE 7.3 (a) Two-sided and (b) lower 95 percent confidence intervals for Example 7.3fi 
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Now, the values of independent uniform (0, 1) random variables can be approximated on 
a computer (by so-called pseudo random numbers); if we generate a vector of r of them, 
and evaluate/ - at this vector, then the value obtained, call itXj, will be a random variable 
with mean 6. If we now repeat this process, then we obtain another value, call it X%, 
which will have the same distribution as X\ . Continuing on, we can generate a sequence 
X\,X2, . . . ,X n of independent and identically distributed random variables with mean 9; 
we then use their observed values to estimate 6. This method of approximating integrals 
is called Monte Carlo simulation. 

For instance, suppose we wanted to estimate the one-dimensional integral 



/ y/l-y 2 dy = EWl-U 2 ] 



where U is a uniform (0, 1) random variable. To do so, let U\, . . . , f/ioo be independent 
uniform (0, 1) random variables, and set 



Xi = yji-uf, ; = i,...,ioo 

In this way, we have generated a sample of 100 random variables having mean 9. Suppose 
that the computer generated values of U\,..., U\qq, resulting in X\, . . . , Xioo having 
sample mean .786 and sample standard deviation .03. Consequently, since £.025,99 = 
1.985, it follows that a 95 percent confidence interval for would be given by 

.786±1.985(.003) 

As a result, we could assert, with 95 percent confidence, that 8 (which can be shown to 
equal 7i74) is between .780 and .792. ■ 

7.3.2 Confidence Intervals for the Variance of a 
Normal Distribution 

If X\, . . . ,X n is a sample from a normal distribution having unknown parameters fx and 
o , then we can construct a confidence interval for a by using the fact that 

S 2 
(n- I) -j ~X„-i 

Hence, 

p { Xi 2 - ff / 2 ,„-i - ( n ~ !)^2 - *«/2,*-l I = 1 - <* 
or, equivalently, 
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Hence when S = s , a 100(1 — a) percent confidence interval for a is 



(» - i)^ 2 (» - IV 2 



1 2 '2 i 

I Xa/2,n-\ X-l-a/2,n-l J 

EXAMPLE 7.3h A standardized procedure is expected to produce washers with very small 
deviation in their thicknesses. Suppose that 10 such washers were chosen and measured. 
If the thicknesses of these washers were, in inches, 



.123 


.133 


.124 


.125 


.126 


.128 


.120 


.124 


.130 


.126 



what is a 90 percent confidence interval for the standard deviation of the thickness of a 
washer produced by this procedure? 

SOLUTION A computation gives that 

S 2 = 1.366 x 10" 5 



Because Xo^ g = 16.917 and /L q = 3.334, and because 



.05,9 

9 x 1.366 x 10" 5 
16.917 



.95,9 



= 7.267 x 10" 



9 x 1.366 x 10" 5 fi 

= 36.875 x 10 -6 

3.334 



1ABLE 7.1 100(1 — a) Percent Confidence Intervals 

Xi,...,X n ~N{n,a 2 ) 



X = ^Xiln, S = 

i=\ 



Y,{Xi-X) 2 l(n-l) 

\|«=1 



Assumption Parameter Confidence Interval Lower Interval Upper Interval 



a known 
a unknown 
fi unknown 



X ± Zb/2 - 



X±t a 



s 



*fn 
(n - 1)5 2 (n - 1)S 2 



-oo,X + z a - 



-oo,X + ta,„-i 



y« 



2 ' 2 

Xal2,n-1 %l-al2,n-l 



0, 



(» - 1)S 2 

^1— ct,n— 1 



X +z a — F ,oo 

s 

X — tu, n —\ —j=> °° 
(n- 1)5 2 



Y 2 



-, oo 



7.4 Estimating the Difference in Means of Two Normal Populations 
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it follows that, with confidence .90, 

a 2 e (7.267 x 10" 6 , 36.875 x 10" 6 ) 
Taking square roots yields that, with confidence .90, 

a e (2.696 x 10" 3 , 6.072 x 10" 3 ) ■ 

One-sided confidence intervals for a are obtained by similar reasoning and are 
presented in Table 7.1, which sums up the results of this section. 

7.4 ESTIMATING THE DIFFERENCE IN MEANS OF TWO 
NORMAL POPULATIONS 

Let X\,X2, . . . ,X„ be a sample of size n from a normal population having mean fi\ and 
variance erf and let Y\, . . . , Y m be a sample of size m from a different normal population 
having mean /i-2 and variance a^ and suppose that the two samples are independent of 
each other. We are interested in estimating \x\ — \X2- 

Since X = X7=i Xiln and Y = ^2f =l Yilm are the maximum likelihood estimators of 
IX\ and 1x2 it seems intuitive (and can be proven) that X — Y is the maximum likelihood 
estimator of /i-i — fi2- 

To obtain a confidence interval estimator, we need the distribution of X — Y . Because 

X ~ M(\x\,o\ln) 
Y ~ AA(/Lt 2 , allm) 

it follows from the fact that the sum of independent normal random variables is also 
normal, that 



X -Y ~AA [n 



M2 



2 2 



Hence, assuming af and er^ are known, we have that 



X-Y -(fix- fi 2 ) 



ai. 



MO, 1) 



(7.4.1) 



and : 



P \ 



-Za/2 < 



x - y - (m - /x 2 ) 



2 2 

n m 



< Za/2 



= 1 
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or, equivalently, 




P 


X - Y - Z a /2\ 


n m 



< il\- [12 <X ~Y + Za/2\ 



! ot 



III 



= 1 



Hence, if X and Y are observed to equal x and y, respectively, then a 100(1 — a) two- 
sided confidence interval estimate for fi i — 1x2 is 



Ml -M2 e 



"7 -Zq,/2 a 



_2_ 



" y + -Za/21 



_2_ 
m 



One-sided confidence intervals for /xj — fi2 are obtained in a similar fashion, and we 
leave it for the reader to verify that a 100(1 — a) percent one-sided interval is given by 



Ml - M2 e 



-00, x ■ 



"7 + ^ 



■■ a JOyln + ffl/w? j 



Program 7.4. 1 will compute both one- and two-sided confidence intervals for jX\ — fi2- 

EXAMPLE 7.4a Two different types of electrical cable insulation have recently been tested 
to determine the voltage level at which failures tend to occur. When specimens were 
subjected to an increasing voltage stress in a laboratory experiment, failures for the two 
types of cable insulation occurred at the following voltages: 



Type A 


Type B 


36 54 


52 60 


44 52 


64 44 


41 37 


38 48 


53 51 


68 46 


38 44 


66 70 


36 35 


52 62 


34 44 





Suppose that it is known that the amount of voltage that cables having type A insulation can 
withstand is normally distributed with unknown mean \±a and known variance a^ = 40, 
whereas the corresponding distribution for type B insulation is normal with unknown 
mean //.g and known variance crj = 100. Determine a 95 percent confidence interval 
for \ia — f^B- Determine a value that we can assert, with 95 percent confidence, exceeds 
fJ-A - MS- 



SOLUTION We run Program 7-4.1 to obtain the solution (see Figure 7.4). 
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Confidence Interval: Two Normal Means. Known Variance 



List 1 | Sample size = 14 



Data value i 



44 



Add This Point To List 1 



Remove Selected Point From List 1 



34 
54 


♦ 


52 




37 




51 




44 

35 




ZZM 


♦ 



Population 
Variance = 
of List 1 



40 



Clear List 1 



List 2 Sample size = 12 



Data value : 



62 



Add This Point To List 2 



Remove Selected Point From List 2 



66 
52 


♦ 


60 




44 




48 




46 

70 




[*£^H 


♦ 



Population 
Variance = 
of List 2 



100 



Clear List 2 



Enter the value of a: 
(0<a<1) 



0.05 



Start 



O One-Sided 
® Two-Sided 



® Upper 
O Lower 



Quit 



The 95% confidence interval for the mean is (-19.6056, -6.4897) 



(a) 



FIGURE 7.4 (a) Two-sided and (b) lower 95 percent confidence intervals for Example 7.4a. 



Let us suppose now that we again desire an interval estimator of [ii — fij but that the 
population variances af and a^ are unknown. In this case, it is natural to try to replace 
(Tj and a\ in Equation 7.4.1 by the sample variances 



m 



Vd-xf 

n — 1 

(Yj - Y) 2 

m — 1 
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Confidence Interval: Two Normal Means. Known Variance 



List 1 | Sample size = 14 



Data value * 



44 



Add This Point To List 1 



Remove Selected Point From List 1 



34 
54 
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52 




37 




51 




44 
35 
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♦ 



Population 
Variance = 
of List 1 



40 



Clear List 1 



List 2 Sample size = 12 



Data value ■■ 



62 



Add This Point To List 2 



Remove Selected Point From List 2 



Enter the value of a: 
(0<a<1) 



0.05 



® One-Sided 
O Two-Sided 



O Upper 
® Lower 



66 
52 


4 


60 




44 




48 




46 

70 




L*z£^H 


♦ 



Population 
Variance = 
of List 2 



100 



Clear List 2 



Start 



Quit 



The 95% lower confidence interval for the mean is (-infinity, -7.544) 



(b) 



FIGURE 7.4 (continued 



That is, it is natural to base our interval estimate on something like 



X - Y - (m - fi 2 ) 



Si In + Slim 



However, to utilize the foregoing to obtain a confidence interval, we need its distribution 
and it must not depend on any of the unknown parameters cr^ and a^. Unfortunately, this 
distribution is both complicated and does indeed depend on the unknown parameters a^ 
and o ~2- In fact, it is only in the special case when or = ai that we will be able to obtain 
an interval estimator. So let us suppose that the population variances, though unknown, 
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are equal and let a denote their common value. Now, from Theorem 6.5.1 it follows 
that 

S 2 
(„_1) 1 ~ x 2 

a 1 - 

and 

S 2 
(m- l)_~ x j 

Also, because the samples are independent, it follows that these two chi-square ran- 
dom variables are independent. Hence, from the additive property of chi-square random 
variables, which states that the sum of independent chi-square random variables is also 
chi-square with a degree of freedom equal to the sum of their degrees of freedom, it 
follows that 

S 2 S 2 

{n _ i) 1 + (m _ i) 2 ~ x 2 +m _ 2 (7.4.2) 



a 1 



Also, since 



that 



X-Y~N\im-i*2 

X - Y - {iii - fi 2 ) 



2 2 

o a 



m 



2 2 

a a 



N(0, 1) (7.4.3) 



Now it follows from the fundamental result that in normal sampling X and S are inde- 
pendent (Theorem 6.5.1), thatX\,S 2 ,X2, S 2 are independent random variables. Hence, 
using the definition of a ^-random variable (as the ratio of two independent random vari- 
ables, the numerator being a standard normal and the denominator being the square root 
of a chi-square random variable divided by its degree of freedom parameter), it follows 
from Equations 7.4.2 and 7.4.3 that if we let 

S 2 = (n- l)S 2 + {m- \)S 2 



P 



n + m ■ 



X - Y - (jtti - fi 2 ) _. f^2T~2 _X-Y -{ixi- ix 2 ) 



then 



Jo 2 {\ln+\lm) ' V P ls 2 {lln+llm) 

has a ^-distribution with n + m — 2 degrees of freedom. Consequently, 

A ^ r-F-( Ml -/x 2 ) 1 

i 1 tal2,n+m—2 _ , — ; ; — _ ta/2,n+m—2 | — >■ 

Sp^/lln + Vim 
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Therefore, when the data result in the values X — x, Y = y,Sp = Sp, we obtain the 
following 100(1 — a) percent confidence interval for fi\ — fi2- 

( X — y - ta/2,n+m-2SpVyn~+YJm, X - J + t a l2, n+m -2S p J\ln + \lm\ (7.4.4) 

One-sided confidence intervals are similarly obtained. 

Program 7.4.2 can be used to obtain both one- and two-sided confidence intervals for 
the difference in means in two normal populations having unknown but equal variances. 

EXAMPLE 7.4b There are two different techniques a given manufacturer can employ to 
produce batteries. A random selection of 12 batteries produced by technique I and of 14 
produced by technique II resulted in the following capacities (in ampere hours): 

Technique I Technique II 



140 


132 


144 


134 


136 


142 


132 


130 


138 


150 


136 


146 


150 


154 


140 


128 


152 


136 


128 


131 


144 


142 


150 


137 






130 


135 



Determine a 90 percent level two-sided confidence interval for the difference in means, 
assuming a common variance. Also determine a 95 percent upper confidence interval for 
Ml _ AB- 
SOLUTION We run Program 7.4.2 to obtain the solution (see Figure 7.5). H 



REMARK 

The confidence interval given by Equation 7.4.4 was obtained under the assumption that 
the population variances are equal; with a 2 as their common value, it follows that 

X - Y - (/hi - fi 2 ) _ X - Y - (/xi - /x 2 ) 
-Ja 2 ln + a 2 lm a-J\ln + \lm 

has a standard normal distribution. However, since a 2 is unknown this result cannot be 
immediately applied to obtain a confidence interval; a 2 must first be estimated. To do so, 
note that both sample variances are estimators of a 2 ; moreover, since S 2 has n—\ degrees 
of freedom and Si has m—\, the appropriate estimator is to take a weighted average of the 
two sample variances, with the weights proportional to these degrees of freedom. That is, 
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the estimator of a is the pooled estimator 



S 2 = (n~ Dff + (m- 1)5 2 2 

P n + m — 1 

and the confidence interval is then based on the statistic 

X-Y -I411- /x 2 ) 



JS*J\ln+\lm 



Confidence Interval: Unknown But Equal Variances 



List 1 | Sample size = 12 



Data value i 
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Clear List 1 



List 2 | Sample size = 14 



Data value ■■ 
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Add This Point To List 2 



Remove Selected Point From List 2 
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Clear List 2 



Enter the value of a: 
(0<a<1) 



Start 



O One-Sided 
® Two-Sided 



® Upper 
O Lower 



Quit 



The 90% confidence interval for the mean difference is (2.4971 , 1 1 .931 5) 



(a) 



FIGURE 7.5 (a) Two-sided and (b) upper 90 percent confidence intervals for Example 7.4b. 
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Confidence Interval: Unknown but Eaual Variances 



List 1 | Sample size = 12 



Data value * 



142 



Add This Point To List 1 



Remove Selected Point From List 1 
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Data value ■■ 
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Add This Point To List 2 



Remove Selected Point From List 2 
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EEEM* 
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Clear List 2 
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Quit 




FIGURE 7.5 (continued) 



which, by our previous analysis, has a ^-distribution with n + m — 2 degrees of freedom. 
The results of this section are summarized up in Table 7.2. 



7.5 APPROXIMATE CONFIDENCE INTERVAL FOR THE 
MEAN OF A BERNOULLI RANDOM VARIABLE 

Consider a population of items, each of which independently meets certain standards with 
some unknown probability^. If n of these items are tested to determine whether they meet 
the standards, how can we use the resulting data to obtain a confidence interval for pi 

If we let X denote the number of the n items that meet the standards, then X is a 
binomial random variable with parameters n and p. Thus, when n is large, it follows by 
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TABLE 7.2 100(1 — a) Percent Confidence Intervals for fj,\ — fi2 

X l ,...,X„~N(fi u af) 
Y x ,...,Y m ^N{ { x 1 ,al) 

n n 

x = J2 x .'"' s f = E^ - v 2 '( n - d 

m m 

y = y. Y ' /n > s i = J2 {Y ' - Y)2 ' {m - l) 



Assumption 



Confidence Interval 



0\ , 02 known 

0\ , 02 unknown but equal 



X — Y ± z a i2\l a \l n + ff|/ra 



'1 1 \ (n - l)Sf + (m - 1)5 2 2 

X - Y ± t a / 2 ,„ +m -2J - H ; ~ 

" . n ml n-\- m — 1 



Assumption 



Lower Confidence Interval 



0\ , 02 known 



0\ , 02 unknown but equal 



(-oo,X -Y + z ay jo\ln + 0%/m) 
-oo,X - Y + t an+m _ 2 



1 1 \ (n - 1)5? + (m- 1)5? 



- + - 
n m 



n + m — 2 



Note: Upper confidence intervals for fl\ — IJ-2 are obtained firom lower confidence intervals for fj.2 ~ t 1 1 ■ 

the normal approximation to the binomial that X is approximately normally distributed 
with mean np and variance np{\ — p). Hence, 



X — np 



~JV«U) 



y/np(l -p) 

where ~ means "is approximately distributed as." Therefore, for any a € (0, 1), 

X - np 



(7.5.1) 



P 



-Za/2 < 



y/np(l -p) 



< z a /2 V tvl-a 



and so if X is observed to equal x, then an approximate 100(1 — a) percent confidence 
region for p is 

Ix — np 
P ■ -Z a /2 < , ~- < Zal2 \ 

The foregoing region, however, is not an interval. To obtain a confidence interval for 
p, \ttp = XI n be the fraction of the items that meet the standards. From Example 7.2a, 



262 Chapter 7: Parameter Estimation 



p is the maximum likelihood estimator of/), and so should be approximately equal 
to p. As a result, y np( 1 — p) will be approximately equal to y/np{\ — p) and so from 
Equation 7.5.1 we see that 

X — np 



Jnp(l -p) 
Hence, for any a G (0, 1) we have that 



-M{o,\) 



p\ 



X-np 
-z a / 2 < = < z a / 2 \ tvl—a 

y/np(\-p) 



or, equivalently, 



P{-Za/2y/np(l -p) < np-X < z a n^np{\ -p)} « 1 - a 
Since p = XI n, the preceding can be written as 



P{p-Z a n^p{\ ~p)ln <p <p + Zal2^p{l -p)ln) «l-ff 
which yields an approximate 100(1 — a) percent confidence interval for p. 

EXAMPLE 7.5a A sample of 100 transistors is randomly chosen from a large batch and 
tested to determine if they meet the current standards. If 80 of them meet the standards, 
then an approximate 95 percent confidence interval for^>, the fraction of all the transistors 
that meet the standards, is given by 

(.8 - 1.96V.8(.2)/100, .8 + 1.96V-8(.2)/100) = (.7216, .8784) 

That is, with "95 percent confidence," between 72.16 and 87.84 percent of all transistors 
meet the standards. ■ 

EXAMPLE 7.5b On October 14, 2003, the New York Times reported that a recent poll 
indicated that 52 percent of the population was in favor of the job performance of 
President Bush, with a margin of error of ±4 percent. What does this mean? Can we 
infer how many people were questioned? 

SOLUTION It has become common practice for the news media to present 95 percent 
confidence intervals. Since z.025 = 1-96, a 95 percent confidence interval for p, the 
percentage of the population that is in favor of President Bush's job performance, is 
given by 



p±l.96Jp(l -p)ln = .52 ± 1.96 % /.52(.48)/^, 
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where n is the size of the sample. Since the "margin of error" is ±4 percent, it follows that 

1.96-v/.52(.48)/»=.04 



(1.96) 2 (.52)(.48) 

n — / .m = 599.29 

(.04) 2 

That is, approximately 599 people were sampled, and 52 percent of them reported favorably 
on President Bush's job performance. I 

We often want to specify an approximate 100(1 —a) percent confidence interval for 
p that is no greater than some given length, say b. The problem is to determine the 
appropriate sample size n to obtain such an interval. To do so, note that the length of 
the approximate 100(1 —a) percent confidence interval for^> from a sample of size n is 



2z a / 2 Jp(l -p)lr 



which is approximately equal to 2z a l2^j pi\ — p)ln. Unfortunately, p is not known in 
advance, and so we cannot just set 2zai%Jp{\. — p)ln equal to b to determine the necessary 
sample size n. What we can do, however, is to first take a preliminary sample to obtain 
a rough estimate of/», and then use this estimate to determine n. That is, we use/<*, the 
proportion of the preliminary sample that meets the standards, as a preliminary estimate 
oip\ we then determine the total sample size n by solving the equation 

2Zal2y/p*{\-p*)ln = b 
Squaring both sides of the preceding yields that 

(2z al2 ) 2 p*(l-p*)/n=b 2 



(2^/ 2 ) 2 /(l 
n = — 



b 2 

That is, if k items were initially sampled to obtain the preliminary estimate of p, then an 
additional n — k (or if n < k) items should be sampled. 

EXAMPLE 7.5c A certain manufacturer produces computer chips; each chip is indepen- 
dently acceptable with some unknown probability/*. To obtain an approximate 99 percent 
confidence interval for p, whose length is approximately .05, an initial sample of 30 
chips has been taken. If 26 of these chips are of acceptable quality, then the prelimi- 
nary estimate of p is 26/30. Using this value, a 99 percent confidence interval of length 
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approximately .05 would require an approximate sample of size 

4fcoo 5 ) 2 26 / 26 \ 4(2.58) 2 26 4 

n = — — r=i 1 I = — : — t^ = 1,231 

(.05) 2 30 V 30/ (.05) 2 30 30 

Hence, we should now sample an additional 1,201 chips and if, for instance, 1,040 of 
them are acceptable, then the final 99 percent confidence interval for p is 



y 1,231 V V 1,231/1,231 1,231 y \ 1,231/ 1,231^ 

or 

/>e (.84091,-89101) ■ 

REMARK 

As shown, a 100(1 —a) percent confidence interval for p will be of approximate length b 
when the sample size is 

(2z a/2 ) 2 .. . 
n=—p—p(l-p) 

Now it is easily shown that the function g(p) = p{\ — p) attains its maximum value of ^ 

us an 

(z a li) 



in the interval < p < 1, when/) = i. Thus an upper bound on n is 



4' 

Tnnc on imnpr KcAiinrl r\r\ in ic 



n < 



b 2 



and so by choosing a sample whose size is at least as large as (z a /2) 2 /b 2 , one can be 
assured of obtaining a confidence interval of length no greater than b without need of any 
additional sampling. ■ 

One-sided approximate confidence intervals for p are also easily obtained; Table 7.3 
gives the results. 

TABLE 7.3 Approximate 100(1 — a) Percent Confidence Intervals for p 

X Is a Binomial (n,p) Random Variable 
p = Xln 

Type of Interval Confidence Interval 



Two-sided pi: Za/2vi>(l — p)ln 

One-sided lower I — oo,p + z a ^/p(l — p)ln\ 

One-sided upper (p — z a ^/p(l — p)ln, 00 I 
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*7.6 CONFIDENCE INTERVAL OF THE MEAN OF THE 
EXPONENTIAL DISTRIBUTION 

IfXi, X2, . . . ,X n are independent exponential random variables each having mean 9, then 
it can be shown that the maximum likelihood estimator of 9 is the sample mean ^2" = i Xiln. 
To obtain a confidence interval estimator of 9, recall from Section 5.7 that 5Zj=i %i nas 
a gamma distribution with parameters n, 1/9. This in turn implies (from the relationship 
between the gamma and chi-square distribution shown in Section 5.8.1.1) that 

i=\ 

Hence, for any a € (0, 1) 



P I X\-al2,2n < nH X ' < Xa/2,2n \ = l 



or, equivalently, 



P 



n 
a«/2,2h 



< 9 < 



n 

2J2X; 

i=\ 

Ai-a/2,2» 



= 1 -a 



Hence, a 100(1 — a) percent confidence interval for 9 is 

2J2X; 2EI; \ 



V 



2 ' 2 

Xa/2,2» ^l-a/2,2» 



/ 



EXAMPLE 7.6a The successive items produced by a certain manufacturer are assumed to 
have useful lives that (in hours) are independent with a common density function 



/(* 



x/0 



< x < co 



If the sum of the lives of the first 10 items is equal to 1,740, what is a 95 percent confidence 
interval for the population mean 9 ? 
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SOLUTION From Program 5.8.1b (or Table A2), we see that 

X.025,20 = 34J69 ' X.975,20 = 9-661 

and so we can conclude, with 95 percent confidence, that 

3480 3480 \ 
34.169' 9.661/ 



or, equivalently, 



9 e (101.847,360.211) 



*7.7 EVALUATING A POINT ESTIMATOR 

Let X = {X\, . . . ,X n ) be a sample from a population whose distribution is specified up to 
an unknown parameter 9, and let d = d{X) be an estimator of 9. How are we to determine 
its worth as an estimator of 0? One way is to consider the square of the difference between 
d(X) and 9. However, since (dQQ — 9) 2 is a random variable, let us agree to consider 
r{d, 9), the mean square error of the estimator d, which is defined by 

r(d, 9) = E[(d{X) - 6) 2 ] 

as an indication of the worth of d as an estimator of 6. 

It would be nice if there were a single estimator d that minimized r(d, 0) for all possible 
values of 6 . However, except in trivial situations, this will never be the case. For example, 
consider the estimator d* defined by 

d*(X u ...,X„)=4 

That is, no matter what the outcome of the sample data, the estimator d* chooses 4 as its 
estimate of 9. While this seems like a silly estimator (since it makes no use of the data), it 
is, however, true that when 9 actually equals 4, the mean square error of this estimator is 0. 
Thus, the mean square error of any estimator different than d* must, in most situations, 
be larger than the mean square error of d* when 9=4. 

Although minimum mean square estimators rarely exist, it is sometimes possible to 
find an estimator having the smallest mean square error among all estimators that satisfy 
a certain property. One such property is that of unbiasedness. 
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Definition 

Let d = d(X) be an estimator of the parameter 9. Then 

b e {d) = E[d(X)] - 9 

is called the bias of d as an estimator of 9. If bg(d) = for all 9, then we say that d is 
an unbiased estimator of 9. In other words, an estimator is unbiased if its expected value 
always equals the value of the parameter it is attempting to estimate. 

EXAMPLE 7.7a LetXi,X2, . . . ,X„ be a random sample from a distribution having unknown 
mean 9. Then 



and 



d2{X\,X2, . . . ,X n ) = 
are both unbiased estimators of 9 since 



d\ (X\ , Xi, . . . , X n ) = X\ 

X 1 +X 2 + ---+X n 



E[Xx\ = E 



'X l +X 2 + ---+X n 



More generally, ds{X\,Xi, . . . ,X n ) = X^=i ^i^i ls an unbiased estimator of 9 whenever 
y. f -_i A, ■ = 1. This follows since 



J2 X ' X * 



Li=l 



n 
n 

= 9J2^ 



If d{X\, . . . ,X n ) is an unbiased estimator, then its mean square error is given by 

r{d,9)=E[{d(X)-9) 2 ] 

= E[(dQQ — E[d{X.)]) ] since d is unbiased 
= Var(</(X)) 

Thus the mean square error of an unbiased estimator is equal to its variance. 
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EXAMPLE 7.7b Combining Independent Unbiased Estimators. Let d\ and dj denote inde- 
pendent unbiased estimators of 9, having known variances ct 2 and ct 2 . That is, for 
i = 1,2, 

E[di\ = 9, Vai{di) = of 

Any estimator of the form 

d = Xd\ +(1 - X)d 2 

will also be unbiased. To determine the value of X that results in d having the smallest 
possible mean square error, note that 

r(d, 9) = Var(</) 

= X 2 Var(</i) + (1 - A) 2 Var(^) 

by the independence of d\ and d 2 

= A 2 CT 2 + (1 - A) 2 CT 2 2 

Differentiation yields that 

—r(d, 9) = 2Xa 2 - 2(1 - X)al 
dk 

To determine the value of A that minimizes r(d, 9) — call it A. — set this equal to 
and solve for X to obtain 

2 Act 2 = 2(1 - A)ct 2 2 



X = 



CT 2 1/CT 2 



CTf+CT 2 2 l/CTf+l/CT 2 z 



In words, the optimal weight to give an estimator is inversely proportional to its variance 
(when all the estimators are unbiased and independent). 

For an application of the foregoing, suppose that a conservation organization wants to 
determine the acidity content of a certain lake. To determine this quantity, they draw some 
water from the lake and then send samples of this water to n different laboratories. These 
laboratories will then, independently, test for acidity content by using their respective 
titration equipment, which is of differing precision. Specifically, suppose that d{, the result 
of a titration test at laboratory i, is a random variable having mean 9, the true acidity of the 
sample water, and variance af,i= !,...,«. If the quantities of, i = 1, . . . , n are known 
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to the conservation organization, then they should estimate the acidity of the sampled 
water from the lake by 

£dilaf 






The mean square error of d is as follows: 



r(d, 9) = Va.i(d) since d is unbiased 

=(i>')"5fe)'* 

1 



E !/-• 

2=1 

A generalization of the result that the mean square error of an unbiased estimator is 
equal to its variance is that the mean square error of any estimator is equal to its variance 
plus the square of its bias. This follows since 

r{d, 6) = E[{d(X) - 6) 2 ] 

= E[{d - E[d] + E[d] - 6) 2 ] 

= E[{d - E[d]) 2 + (E[d] - 6) 2 + 2(E[d] - 9){d - E[d])] 

= E[{d - E[d]) 2 ~\ + E[{E[d] - 0) 2 ] 

+ 2E[(E[d] - 0){d - E[d~\)~\ 
= E[{d - E[d]) 2 ] + {E[d] - 6) 2 + 2{E[d] - 0)E[d - E[d]] 

since E[d] — 8 is constant 
= E[(d - E[d]f] + {E[d] - 9) 2 

The last equality follows since 

E[d - E[d]] = 
Hence 

r(d, 9) = Var(^/) + b 2 {d) 
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EXAMPLE 7.7c LetXi, . . . ,X„ denote a sample from a uniform (0, 6) distribution, where 
6 is assumed unknown. Since 

e 

EM = - 

a "natural" estimator to consider is the unbiased estimator 

n 

d\ = d\Qi) = 



Since E[d\] = 6, it follows that 



r(di,0) =Var(</i) 

= - VaifXi) 
n 

4 02 _ Ql 

= since Var(A}) = — 

n 12 12 

in 

A second possible estimator of 9 is the maximum likelihood estimator, which, as shown 
in Example 7. 2d, is given by 

d-2 = dzQQ = maxl, 

i 

To compute the mean square error of dj as an estimator of 6, we need to first compute 
its mean (so as to determine its bias) and variance. To do so, note that the distribution 
function of d% is as follows: 

F 2 (x) = P{d 2 (X) < x] 
= P{mzx.Xi < x} 

i 

= P{Xi < x for all i = 1, . . . , n) 



I I P{Xi < x] by independence 

i=\ 
/x\» 



X < 



Hence, upon differentiating, we obtain that the density function of d 2 , is 

nx n ~ l 
f2M = —^-,x<6 
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Therefore, 



Also 



and so 



He 



Since 



E[d 2 ] = [ 
Jo 

C 6 

ev4\ = / 

Jo 



,n-\ 



, nx 



n-\ 



■ dx 



n + 1 



n + 2 



Var(d 2 



a 



n 



n + 2 \n + 1 

1 n 

nO 1 



n + 2 (n+l) 2 



nd 1 



(n + 2)(n+iy 



r(d 2 ,9) = (E(d 2 ) - e) z + Var(d 2 ) 

e 2 nO 2 



(n+l) 2 (n + 2)(n+\) 2 


e 2 


n 

n + 2_ 




{n+l) 2 




29 2 




(n + \){n 


+ 2) 



2Q 1 



n — 1,2, . 



(7.7.1) 



(7.7.2) 



(7.7.3) 



(n+l)(n + 2) 3n 

it follows that d 2 is a more superior estimator old than is d\. 

Equation 7.7.1 suggests the use of even another estimator — namely, the unbiased 
estimator (1 + l/n)d 2 (X.) = (1 + l/n) max,Xy. However, rather than considering this 
estimator directly, let us consider all estimators of the form 

d c (X.) = c maxX; = c d 2 (X.) 

i 

where c is a given constant. The mean square error of this estimator is 
r(d c {X),0) = Vax(d c (X)) + CEK(X)] - 6) 2 

= c 2 Var(d 2 (X)) + (cE[d 2 (X)] - 8) 2 



c 2 n9 2 



+ ' 



(n + 2){n+ I) 2 ' \n+l 

by Equations 7.7.2 and 7.7.1 



(7.7.4) 
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To determine the constant c resulting in minimal mean square error, we differentiate to 
obtain 

d 2cn0 2 26 2 n (en \ 

J c r{dAX >' 6 > ~ („ + 2 )(„+l)2 + ^TT l^TT " ) 

Equating this to shows that the best constant c — call it c* — is such that 

+ c*n-(n+\)=0 



n + 2 
or 



(» + !)(» + 2) n + 2 



n 2 + 2n + 1 n + 1 
Substituting this value of c into Equation 7.7.4 yields that 



'n + 2 \ (n + 2)n6 2 2 /n(n + 2) 

r maxJQ, 9 = — + 6 -r - 1 

\»+l i ) («+l) 4 \{n+l) 2 

2 2 



{n + 2)n0 
(n+l) 4 ^(n+l) 4 
q2 



(n+l) 2 



A comparison with Equation 7.7.3 shows that the (biased) estimator (n + 2)1 
(n + l)maxiXi has about half the mean square error of the maximum likelihood 
estimator max ( -J;. ■ 

*7.8 THE BAYES ESTIMATOR 

In certain situations it seems reasonable to regard an unknown parameter 6 as being the 
value of a random variable from a given probability distribution. This usually arises when, 
prior to the observance of the outcomes of the dataXj, . . . ,X„, we have some information 
about the value of 6 and this information is expressible in terms of a probability distribution 
(called appropriately the prior distribution of 9). For instance, suppose that from past 
experience we know that is equally likely to be near any value in the interval (0, 1). 
Hence, we could reasonably assume that 6 is chosen from a uniform distribution on (0, 1). 
Suppose now that our prior feelings about 6 are that it can be regarded as being the 
value of a continuous random variable having probability density function p{9); and 
suppose that we are about to observe the value of a sample whose distribution depends 
on 6. Specifically, suppose thax f{x\0) represents the likelihood — that is, it is the 
probability mass function in the discrete case or the probability density function in the 

* Optional section. 
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continuous case — that a data value is equal to x when 9 is the value of the parameter. 
If the observed data values are Xi — Xi, i — 1, . . . ,», then the updated, or conditional, 
probability density function of 9 is as follows: 

^(0)/(*l,...,*,,|0) 



jf{xi,...,x n \e)p{e)dd 

The conditional density function /X#|xi, . . . ,x„) is called the posterior density function. 
(Thus, before observing the data, one's feelings about 9 are expressed in terms of the 
prior distribution, whereas once the data are observed, this prior distribution is updated 
to yield the posterior distribution.) 

Now we have shown that whenever we are given the probability distribution of a random 
variable, the best estimate of the value of that random variable, in the sense of minimizing 
the expected squared error, is its mean. Therefore, it follows that the best estimate of 
8, given the data values Xi = Xj, i = 1, . . . , n, is the mean of the posterior distribution 
f{9\x\, . . . ,x„). This estimator, called the Bayes estimator, is written as E[9\X\, . . . ,X n ]. 
That is, ilXi = x;, i = !,...,», then the value of the Bayes estimator is 



j ef(9\x u . 



E[9\Xi =xi,...,X„ = x„] = / 9f(9\xi,...,x„)d9 

EXAMPLE 7.8a Suppose that Xi, . . . , X n axe independent Bernoulli random variables, each 
having probability mass function given by 

f( x \9)=9 x (l -9) l ~ x , x = 0, 1 

where 9 is unknown. Further, suppose that 9 is chosen from a uniform distribution on 
(0, 1). Compute the Bayes estimator of 9. 

SOLUTION We must compute E[9 \X\ , . . . , X n ]. Since the prior density of 9 is the uniform 
density 

p{9) = 1, < 9 < 1 

we have that the conditional density of 9 given X\, . . . ,X„ is given by 

rta\ \ f(xi,---,x„,9) 
f{9\xi,...,x„) = — — 

j \X\ , . . . , X n ) 

f{ Xl ,...,x n \8)p{9) 



f;f{xi,...,x n \9)p(9)dB 

S "*'(1 - 0)"-^l x i 
J^9 Y '" x '{\-9) n - Y '" x 'd9 
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Now it can be shown that for integral values m and r 

/ e M (i-eyje= — — - t - (7.8.1) 

Jo (m + r+ly. 



Hence, upon letting x = X)»=i x ' 



(w+l)!0*(l -e) n ~ x 

f(9\ Xl ,...,x„)= K - '- (7.8.2) 

xl(n — x)\ 



Therefore, 



(n+ I)' 

E[o\*i,...,*n\= ,, ;, / 6 i+x (i-er- x , 

xl(n — x)\ 



[ 9 l+x ( 
Jo 



(» + l)! {l+x)\{n-x)\ 7Q1 

rrom equation /.o.l 



x!(« — x)\ [n + 2)! 

x+ 1 



« + 2 
Thus, the Bayes estimator is given by 

n 

T.Xi + 1 

E[9\X l ,...,X n \= l=l 

n + 2 

As an illustration, if 1 independent trials, each of which results in a success with probability 
9, result in 6 successes, then assuming a uniform (0, 1) prior distribution on 9, the Bayes 
estimator of 9 is 7/12 (as opposed, for instance, to the maximum likelihood estimator 
of6/10). ■ 

REMARK 

The conditional distribution of 9 given that X,- = xy, i = 1, . . . , n, whose density function 
is given by Equation 7.8.2, is called the beta distribution with parameters X^=i x i + 1> 

EXAM PLE 7.8b Suppose Xj , . . . , X„ are independent normal random variables, each having 
unknown mean 9 and known variance <Tq. If is itself selected from a normal population 
having known mean [M and known variance a , what is the Bayes estimator of 0? 

SOLUTION In order to determine E[9\X\, . . . ,X n ], the Bayes estimator, we need first 
determine the conditional density of 9 given the values o(X\, . . . ,X n . Now 

J \X\ , . . . , X n ) 
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vher 



1 

f(x h . . .,x„\0) = , exp | - V(x; - <9) 2 /2er 2 



p{6) = -==- exp{-(# - /z) 2 /2a 2 } 

JlltCf 



and 

/oo 
/(*l,. . . ,x„\6)p(9) d9 
-oo 

With the help of a little algebra, it can now be shown that this conditional density is a 
normal density with mean 

2 2 

E[0\X l ,...,X n ] = " a 1 X+ °° 7 ,i (7.8.3) 



2 7 i 2' 

oyf no 1 + OX 



CT o v , o- 



1 



X+ ^^m 



« 1 n \ 

°o ct2 ct o 2 ct2 



ana variance 



2 2 



■ffn 2 



Var(e|Ai,...,A^) 

„„._ , . Jq 

Writing the Bayes estimator as we did in Equation 7.8.3 is informative, for it shows that it 
is a weighted average of X, the sample mean, and fi, the a priori mean. In fact, the weights 
given to these two quantities are in proportion to the inverses of a^ln (the conditional 
variance of the sample mean X given 9) and a (the variance of the prior distribution). ■ 

REMARK: ON CHOOSING A NORMAL PRIOR 

As illustrated by Example 7.8b, it is computationally very convenient to choose a normal 
prior for the unknown mean of a normal distribution — for then the Bayes estimator 
is simply given by Equation 7.8.3. This raises the question of how one should go about 
determining whether there is a normal prior that reasonably represents one's prior feelings 
about the unknown mean. 

To begin, it seems reasonable to determine the value — call it /x — that you a priori 
feel is most likely to be near 0. That is, we start with the mode (which equals the mean 
when the distribution is normal) of the prior distribution. We should then try to ascertain 
whether or not we believe that the prior distribution is symmetric about fi. That is, for 
each a > do we believe that it is just as likely that will lie between /x — a and /x as it is 
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that it will be between fi and fi + al If the answer is positive, then we accept, as a working 
hypothesis, that our prior feelings about 9 can be expressed in terms of a prior distribution 
that is normal with mean \i. To determine a , the standard deviation of the normal prior, 
think of an interval centered about fi that you a priori feel is 90 percent certain to contain 
9. For instance, suppose you feel 90 percent (no more and no less) certain that 9 will lie 
between [± — a and [X + a. Then, since a normal random variable 9 with mean /x and 
variance a is such that 

P 1-1.645 < — — - < 1.645 \ = .90 



or 

P{/x - 1.645a < 9 < \i + 1.645tx} = .90 
it seems reasonable to take 

1.645er = a or a = 



1.645 



Thus, if your prior feelings can indeed be reasonably described by a normal distribution, 
then that distribution would have mean fi and standard deviation a = a/1.645. As a test 
of whether this distribution indeed fits your prior feelings you might ask yourself such 
questions as whether you are 95 percent certain that 9 will fall between fi — 1.96er and 
fi + 1.96ct 3 or whether you are 99 percent certain that 9 will fall between [i — 2.58cr and 
fi + 2.58cr, where these intervals are determined by the equalities 

9 - ix 1 

-1.96 < < 1.96 1 = .95 

9-ii 1 

-2.58 < < 2.58 | = .99 

which hold when 9 is normal with mean /x and variance a ' . 

EXAMPLE 7.8c Consider the likelihood function f(x\, ... ,x n \9) and suppose that 9 is 
uniformly distributed over some interval (a, b) . The posterior density of 9 given X\ , . . . , X„ 
equals 



f(9\xi,. ..,x„) 



f(x u ...,x n \9)p(9) 
f b a f{xu...,x n \9)p{9)d9 

f{x\,.. .,x„\9) 
J a f{xi,...,x n \9)d9 



a < 9 < b 



Now the mode of a density f(9) was defined to be that value of 9 that maximizes f{9). 
By the foregoing, it follows that the mode of the density /"(6> \x\ , . . . , x n ) is that value of 9 
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maximizing/ - (x\, . . . , x n \8); that is, it is just the maximum likelihood estimate of 9 [when 
it is constrained to be in (a, b)]. In other words, the maximum likelihood estimate equals 
the mode of the posterior distribution when a uniform prior distribution is assumed. ■ 

If, rather than a point estimate, we desire an interval in which 6 lies with a specified 
probability — say 1 — a — we can accomplish this by choosing values a and b such that 



/ 



b 
f{9\x\, . . . ,x n ) d9 = 1 



EXAMPLE 7.8d Suppose that if a signal of value s is sent from location A, then the signal 
value received at location B is normally distributed with mean s and variance 60. Suppose 
also that the value of a signal sent at location A is, a priori, known to be normally distributed 
with mean 50 and variance 100. If the value received at location B is equal to 40, determine 
an interval that will contain the actual value sent with probability .90. 

SOLUTION It follows from Example 7.8b that the conditional distribution of S, the signal 
value sent, given that 40 is the value received, is normal with mean and variance given by 

1/60 , 1/100 

E[S data] = — 40 + — 50 = 43.75 

1/60+1/100 1/60 + 1/100 

Var(5[data) = — = 37.5 

1/60+1/100 



Hence, given that the value received is 40, (5 — 43.75)/>/37.5 has a unit standard 
distribution and so 

{S - 43 75 1 

-1.645 < =!=- < 1.645|data = .90 
VW5 J 

or 

P{43.75 - 1.645v37i5 < S < 43.75 + 1.645V37i5|data} = .95 
That is, with probability .90, the true signal sent lies within the interval (33.68, 53.82). ■ 



Problems 

1. LetXi, . . . ,X n be a sample from the distribution whose density function is 

f (*) = \ „ , 

( otherwise 

Determine the maximum likelihood estimator of 8. 
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2. Determine the maximum likelihood estimator of when X\, . . . , X n is a sample 
with density function 



fix) - -,C 



- l,-\x-6\ 

~ 2 C 



— OO < X < OO 

3. LetXi, . . . ,X n be a sample from a normal /z, er population. Determine the max- 
imum likelihood estimator of a when /x is known. What is the expected value of 
this estimator? 

4. The height of a radio tower is to be measured by measuring both the horizontal 
distance X from the center of its base to a measuring instrument and the vertical 
angle of the measuring device (see the following figure). If five measurements of 
the distance L give (in feet) values 

150.42, 150.45, 150.49, 150.52, 150.40 

and four measurements of the angle 6 give (in degrees) values 

40.26, 40.27, 40.29, 40.26 

estimate the height of the tower. 




tower 



5. Suppose that X\, . . . ,X n are normal with mean itj; Y\, . . . , Y n are normal with 
mean /12; and W\, . . . , W n are normal with mean fi\ + /X2- Assuming that all 3« 
random variables are independent with a common variance, find the maximum 
likelihood estimators of fi\ and [12. 

6. River floods are often measured by their discharges (in units of feet cubed per 
second). The value v is said to be the value of a 100-year flood if 

P{D >v} = .01 

where D is the discharge of the largest flood in a randomly chosen year. The 
following table gives the flood discharges of the largest floods of the Blackstone 
River in Woonsocket, Rhode Island, in each of the years from 1929 to 1965. 
Assuming that these discharges follow a lognormal distribution, estimate the value 
of a 100-year flood. 
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Annual Floods of the Blackstone River (1929-1965) 



Flood Discharge 

Year (ft 3 /s) 

1929 4,570 

1930 1,970 

1931 8,220 

1932 4,530 

1933 5,780 

1934 6,560 

1935 7,500 

1936 15,000 

1937 6,340 

1938 15,100 

1939 3,840 

1940 5,860 

1941 4,480 

1942 5,330 

1943 5,310 

1944 3,830 

1945 3,410 

1946 3,830 

1947 3,150 

1948 5,810 

1949 2,030 

1950 3,620 

1951 4,920 

1952 4,090 

1953 5,570 

1954 9,400 

1955 32,900 

1956 8,710 

1957 3,850 

1958 4,970 

1959 5,398 

1960 4,780 

1961 4,020 

1962 5,790 

1963 4,510 

1964 5,520 

1965 5,300 



7. A manufacturer of heat exchangers requires that the plate spacings of its exchang- 
ers be between .240 and .260 inches. A quality control engineer sampled 20 
exchangers and measured the spacing of the plates on each exchanger. If the 
sample mean and sample standard deviation of these 20 measurements are .254 
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and .005, estimate the fraction of all exchangers whose plate spacings fall outside 
the specified region. Assume that the plate spacings have a normal distribution. 

8. An electric scale gives a reading equal to the true weight plus a random error that 
is normally distributed with mean and standard deviation o = . 1 mg. Suppose 
that the results of five successive weighings of the same object are as follows: 3. 142, 
3.163,3.155,3.150,3.141. 

(a) Determine a 95 percent confidence interval estimate of the true weight. 

(b) Determine a 99 percent confidence interval estimate of the true weight. 

9. The PCB concentration of a fish caught in Lake Michigan was measured by a 
technique that is known to result in an error of measurement that is normally 
distributed with a standard deviation of .08 ppm (parts per million). Suppose the 
results of 10 independent measurements of this fish are 

11.2, 12.4, 10.8, 11.6, 12.5, 10.1, 11.0, 12.2, 12.4, 10.6 

(a) Give a 95 percent confidence interval for the PCB level of this fish. 

(b) Give a 95 percent lower confidence interval. 

(c) Give a 95 percent upper confidence interval. 

10. The standard deviation of test scores on a certain achievement test is 11.3. If a 
random sample of 81 students had a sample mean score of 74.6, find a 90 percent 
confidence interval estimate for the average score of all students. 

11. Let X\,... ,X n ,X n+ \ be a sample from a normal population having an unknown 
mean fi and variance 1. Let X„ = X)«=i Xiln be the average of the first n of them. 

(a) What is the distribution of X n +\ — X n ~i 

(b) If X„ = 4, give an interval that, with 90 percent confidence, will contain the 
value ofX n +i. 

12. \iX\,...,X n is a sample from a normal population whose mean /x is unknown 
but whose variance a is known, show that (— oo,X + z a al^pn) is a 100(1 — a) 
percent lower confidence interval for [i. 

13. A sample of 20 cigarettes is tested to determine nicotine content and the average 
value observed was 1 .2 mg. Compute a 99 percent two-sided confidence interval for 
the mean nicotine content of a cigarette if it is known that the standard deviation 
of a cigarette's nicotine content is a = .2 mg. 

14. In Problem 13, suppose that the population variance is not known in advance 
of the experiment. If the sample variance is .04, compute a 99 percent two-sided 
confidence interval for the mean nicotine content. 

15. In Problem 14, compute a value c for which we can assert "with 99 percent 
confidence" that c is larger than the mean nicotine content of a cigarette. 
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16. Suppose that when sampling from a normal population having an unknown 
mean /x and unknown variance a , we wish to determine a sample size n so 
as to guarantee that the resulting 100(1 — a) percent confidence interval for fi 
will be of size no greater than A, for given values a and A. Explain how we can 
approximately do this by a double sampling scheme that first takes a subsample 
of size 30 and then chooses the total sample size by using the results of the first 
subsample. 

17. The following data resulted from 24 independent measurements of the melting 
point of lead. 



330°C 


322°C 


345°C 


328.6°C 


331°C 


342°C 


342.4°C 


340.4°C 


329.7°C 


334°C 


326.5°C 


325.8°C 


337.5°C 


327.3°C 


322.6°C 


341°C 


340°C 


333°C 


343.3°C 


331°C 


341°C 


329.5°C 


332.3°C 


340°C 



Assuming that the measurements can be regarded as constituting a normal sample 
whose mean is the true melting point of lead, determine a 95 percent two-sided 
confidence interval for this value. Also determine a 99 percent two-sided confidence 
interval. 

18. The following are scores on IQ tests of a random sample of 18 students at a large 
eastern university. 

130, 122, 119, 142, 136, 127, 120, 152, 141, 
132, 127, 118, 150, 141, 133, 137, 129, 142 

(a) Construct a 95 percent confidence interval estimate of the average IQ score 
of all students at the university. 

(b) Construct a 95 percent lower confidence interval estimate. 

(c) Construct a 95 percent upper confidence interval estimate. 

19. Suppose that a random sample of nine recently sold houses in a certain city resulted 
in a sample mean price of $222,000, with a sample standard deviation of $22,000. 
Give a 95 percent upper confidence interval for the mean price of all recently sold 
houses in this city. 

20. A company self-insures its large fleet of cars against collisions. To determine its 
mean repair cost per collision, it has randomly chosen a sample of 16 accidents. 
If the average repair cost in these accidents is $2,200 with a sample standard 
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deviation of $800, find a 90 percent confidence interval estimate of the mean cost 
per collision. 

21. A standardized test is given annually to all sixth-grade students in the state of 
Washington. To determine the average score of students in her district, a school 
supervisor selects a random sample of 100 students. If the sample mean of these 
students' scores is 320 and the sample standard deviation is 16, give a 95 percent 
confidence interval estimate of the average score of students in that supervisor's 
district. 

22. Each of 20 science students independently measured the melting point of lead. 
The sample mean and sample standard deviation of these measurements were 
(in degrees centigrade) 330.2 and 15.4, respectively. Construct (a) a 95 percent 
and (b) a 99 percent confidence interval estimate of the true melting point of 
lead. 

23. A random sample of 300 CitiBank VISA cardholder accounts indicated a sam- 
ple mean debt of $1,220 with a sample standard deviation of $840. Construct 
a 95 percent confidence interval estimate of the average debt of all cardholders. 

24. In Problem 23, find the smallest value v that "with 90 percent confidence," exceeds 
the average debt per cardholder. 

25. Verify the formula given in Table 7. 1 for the 100(1 — a) percent lower confidence 
interval for \x when a is unknown. 

26. The range of a new type of mortar shell is being investigated. The observed ranges, 
in meters, of 20 such shells are as follows: 



2,100 


1,984 


2,072 


1,898 


1,950 


1,992 


2,096 


2,103 


2,043 


2,218 


2,244 


2,206 


2,210 


2,152 


1,962 


2,007 


2,018 


2,106 


1,938 


1,956 



Assuming that a shell's range is normally distributed, construct (a) a 95 percent 
and (b) a 99 percent two-sided confidence interval for the mean range of a shell. 
(c) Determine the largest value v that, "with 95 percent confidence," will be less 
than the mean range. 

27. Studies were conducted in Los Angeles to determine the carbon monoxide 
concentration near freeways. The basic technique used was to capture air sam- 
ples in special bags and to then determine the carbon monoxide concentration by 
using a spectrophotometer. The measurements in ppm (parts per million) over 
a sampled period during the year were 102.2, 98.4, 104.1, 101, 102.2, 100.4, 
98.6, 88.2, 78.8, 83, 84.7, 94.8, 105.1, 106.2, 111.2, 108.3, 105.2, 103.2, 99, 
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98.8. Compute a 95 percent two-sided confidence interval for the mean carbon 
monoxide concentration. 

28. A set of 10 determinations, by a method devised by the chemist Karl Fischer, of 
the percentage of water in a methanol solution yielded the following data. 

.50, .55, .53, .56, .54, 
.57, .52, .60, .55, .58 

Assuming normality, use these data to give a 95 percent confidence interval for the 
actual percentage. 

29. Suppose that U\, L^,--- is a sequence of independent uniform (0,1) random 
variables, and define N by 

N = min{« \U\-\ h U n > 1} 

That is, N is the number of uniform (0, 1) random variables that need be summed 
to exceed 1. Use random numbers to determine the value of 36 random variables 
having the same distribution as N , then use these data to obtain a 95 percent 
confidence interval estimate of E[N]. Based on this interval, guess the exact value 
o£E[N]. 

30. An important issue for a retailer is to decide when to reorder stock from a supplier. 
A common policy used to make the decision is of a type called s, S: The retailer 
orders at the end of a period if the on-hand stock is less than s, and orders enough to 
bring the stock up to S. The appropriate values of s and S depend on different cost 
parameters, such as inventory holding costs and the profit per item sold, as well as 
the distribution of the demand during a period. Consequently, it is important for 
the retailer to collect data relating to the parameters of the demand distribution. 
Suppose that the following data give the numbers of a certain type of item sold in 
each of 30 weeks. 

14, 8, 12, 9, 5, 22, 15, 12, 16, 7, 10, 9, 15, 15, 12, 
9, 11, 16, 8,7, 15, 13,9,5, 18, 14, 10, 13,7, 11 

Assuming that the numbers sold each week are independent random variables from 
a common distribution, use the data to obtain a 95 percent confidence interval for 
the mean number sold in a week. 

31. A random sample of 16 full professors at a large private university yielded a sample 
mean annual salary of $90,450 with a sample standard deviation of $9,400. Deter- 
mine a 95 percent confidence interval of the average salary of all full professors at 
that university. 
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32. Let X\,. . . ,X n ,X„+\ denote a sample from a normal population whose mean 
li and variance a are unknown. Suppose that we are interested in using the 
observed values oiX\, , , , ,X n to determine an interval, called a prediction interval, 
that we predict will contain the value of X n +\ with 100(1 —a) percent confidence. 
Let X„ and S„ be the sample mean and sample variance ofXi, . . . , X„. 

(a) Determine the distribution of 

Xn+i — X„ 

(b) Determine the distribution of 

X n +\ — x„ 



S n J 1 + " 
V n 

(c) Give the prediction interval for X„_|_i . 

(d) The interval in part (c) will contain the value of X n+ \ with 100(1 — a) percent 
confidence. Explain the meaning of this statement. 

33. National Safety Council data show that the number of accidental deaths due to 
drowning in the United States in the years from 1990 to 1993 were (in units of 
one thousand) 5.2, 4.6, 4.3, 4.8. Use these data to give an interval that will, with 
95 percent confidence, contain the number of such deaths in 1994. 

34. The daily dissolved oxygen concentration for a water stream has been recorded 
over 30 days. If the sample average of the 30 values is 2.5 mg/liter and the sample 
standard deviation is 2.12 mg/liter, determine a value which, with 90 percent 
confidence, exceeds the mean daily concentration. 

35. Verify the formulas given in Table 7.1 for the 100(1 — a) percent lower and upper 
confidence intervals for a . 

36. The capacities (in ampere-hours) of 10 batteries were recorded as follows: 

140, 136, 150, 144, 148, 152, 138, 141, 143, 151 

(a) Estimate the population variance o . 

(b) Compute a 99 percent two-sided confidence interval for a . 

(c) Compute a value v that enables us to state, with 90 percent confidence, that 
or is less than v. 

37. Find a 95 percent two-sided confidence interval for the variance of the diameter 
of a rivet based on the data given here. 



Problems 285 



6.68 


6.66 


6.62 


6.72 


6.76 


6.67 


6.70 


6.72 


6.78 


6.66 


6.76 


6.72 


6.76 


6.70 


6.76 


6.76 


6.74 


6.74 


6.81 


6.66 


6.64 


6.79 


6.72 


6.82 


6.81 


6.77 


6.60 


6.72 


6.74 


6.70 


6.64 


6.78 


6.70 


6.70 


6.75 


6.79 



Assume a normal population. 

38. The following are independent samples from two normal populations, both of 
which have the same standard deviation o . 



16,17,19,20,18 and 3,4,8 

Use them to estimate a . 

39. The amount of beryllium in a substance is often determined by the use of a 
photometric filtration method. If the weight of the beryllium is //, then the 
value given by the photometric filtration method is normally distributed with 
mean \± and standard deviation a . A total of eight independent measurements of 
3.180 mg of beryllium gave the following results. 

3.166,3.192,3.175,3.180,3.182,3.171,3.184,3.177 

Use the preceding data to 

(a) estimate er; 

(b) find a 90 percent confidence interval estimate of a. 

40. If X\ , . . . , X n is a sample from a normal population, explain how to obtain a 
100(1 — a) percent confidence interval for the population variance a when the 
population mean /i is known. Explain in what sense knowledge of fi improves the 
interval estimator compared with when it is unknown. 

Repeat Problem 38 if it is known that the mean burning time is 53.6 seconds. 

41. A civil engineer wishes to measure the compressive strength of two different 
types of concrete. A random sample of 10 specimens of the first type yielded 
the following data (in psi) 

Typel: 3,250, 3,268, 4,302, 3,184, 3,266 
3,297, 3,332, 3,502, 3,064, 3,116 
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whereas a sample of 10 specimens of the second yielded the data 

Type 2: 3,094, 3,106, 3,004, 3,066, 2,984, 
3,124, 3,316, 3,212, 3,380, 3,018 

If we assume that the samples are normal with a common variance, determine 

(a) a 95 percent two-sided confidence interval for fi\ — //2, the difference in 
means; 

(b) a 95 percent one-sided upper confidence interval for fi\ — ^2; 

(c) a 95 percent one-sided lower confidence interval for fi\ — fi2- 

42. Independent random samples are taken from the output of two machines on 
a production line. The weight of each item is of interest. From the first machine, 
a sample of size 36 is taken, with sample mean weight of 120 grams and a sam- 
ple variance of 4. From the second machine, a sample of size 64 is taken, with 
a sample mean weight of 130 grams and a sample variance of 5. It is assumed that 
the weights of items from the first machine are normally distributed with mean 
fi\ and variance a , and that the weights of items from the second machine are 
normally distributed with mean [12 and variance a (that is, the variances are 
assumed to be equal). Find a 99 percent confidence interval for /xi — /X2, the 
difference in population means. 

43. Do Problem 42 when it is known in advance that the population variances are 
4 and 5. 

44. The following are the burning times in seconds of floating smoke pots of two 
different types. 

Type I Type II 

481 572 526 537 

506 561 511 582 

527 501 556 605 

661 487 542 558 

501 524 491 578 

Find a 99 percent confidence interval for the mean difference in burning times 
assuming normality with unknown but equal variances. 

45. If X\, . . . ,X n is a sample from a normal population having known mean ii\ 
and unknown variance erf, and Y\,...,Y m is an independent sample from 
a normal population having known mean \±2 and unknown variance ay, 
determine a 100(1 — a) percent confidence interval for crf/a^. 

46. Two analysts took repeated readings on the hardness of city water. Assuming 
that the readings of analyst i constitute a sample from a normal population 
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having variance of, i = 1, 2, compute a 95 percent two-sided confidence interval 
for <7j 1^2 when the data are as follows: 



Coded Measures of Hardness 
Analyst 1 Analyst 2 



.46 
.62 
.37 
.40 
.44 
.58 
.48 
.53 



.82 
.61 
.89 
.51 
.33 
.48 
.23 
.25 
.67 



47. A problem of interest in baseball is whether a sacrifice bunt is a good strategy 
when there is a man on first base and no outs. Assuming that the bunter will 
be out but will be successful in advancing the man on base, we could compare 
the probability of scoring a run with a player on first base and no outs with 
the probability of scoring a run with a player on second base and one out. The 
following data resulted from a study of randomly chosen major league baseball 
games played in 1959 and 1960. 

(a) Give a 95 percent confidence interval estimate for the probability of scoring 
at least one run when there is a man on first and no outs. 

(b) Give a 95 percent confidence interval estimate for the probability of scoring 
at least one run when there is a man on second and one out. 



Base Occupied 



Number 
of Outs 



Number of Cases 

in Which Runs 

Are Scored 



Total Number 
of Cases 



First 
Second 



1,044 
401 



1,728 
657 



48. A random sample of 1,200 engineers included 48 Hispanic Americans, 80 African 
Americans, and 204 females. Determine 90 percent confidence intervals for the 
proportion of all engineers that are 

(a) female; 

(b) Hispanic Americans or African Americans. 
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49. To estimate p, the proportion of all newborn babies that are male, the gender of 
10,000 newborn babies was noted. If 5,106 of them were male, determine (a) a 
90 percent and (b) a 99 percent confidence interval estimate of p. 

50. An airline is interested in determining the proportion of its customers who are 
flying for reasons of business. If they want to be 90 percent certain that their 
estimate will be correct to within 2 percent, how large a random sample should 
they select? 

51. A recent newspaper poll indicated that Candidate A is favored over Candidate B 
by a 53 to 47 percentage, with a margin of error of ±4 percent. The newspaper 
then stated that since the 6-point gap is larger than the margin of error, its readers 
can be certain that Candidate A is the current choice. Is this reasoning correct? 

52. A market research firm is interested in determining the proportion of households 
that are watching a particular sporting event. To accomplish this task, they plan 
on using a telephone poll of randomly chosen households. How large a sample is 
needed if they want to be 90 percent certain that their estimate is correct to within 
±.02? 

53. In a recent study, 79 of 140 meteorites were observed to enter the atmosphere 
with a velocity of less than 25 miles per second. If we take p = 79/140 as an 
estimate of the probability that an arbitrary meteorite that enters the atmosphere 
will have a speed less than 25 miles per second, what can we say, with 99 percent 
confidence, about the maximum error of our estimate? 

54. A random sample of 100 items from a production line revealed 17 of them to be 
defective. Compute a 95 percent two-sided confidence interval for the probability 
that an item produced is defective. Determine also a 99 percent upper confidence 
interval for this value. What assumptions are you making? 

55. Of 100 randomly detected cases of individuals having lung cancer, 67 died within 
5 years of detection. 

(a) Estimate the probability that a person contracting lung cancer will die within 
5 years. 

(b) How large an additional sample would be required to be 95 percent confident 
that the error in estimating the probability in part (a) is less than .02? 

56. Derive 100(1 — a) percent lower and upper confidence intervals for/), when 
the data consist of the values of n independent Bernoulli random variables with 
parameter/). 

57. Suppose the lifetimes of batteries are exponentially distributed with mean 6. If the 
average of a sample of 10 batteries is 36 hours, determine a 95 percent two-sided 
confidence interval for 9. 
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58. Determine 100(1 — a) percent one-sided upper and lower confidence intervals for 
9 in Problem 57. 

59. Let X\,X2, . . . ,X n denote a sample from a population whose mean value 9 is 
unknown. Use the results of Example 7.7b to argue that among all unbiased 
estimators of 9 of the form ^"_j A.;JQ, y ■_-, A; = 1, the one with minimal mean 
square error has A; = lln, i = l,...,n. 

60. Consider two independent samples from normal populations having the same 
variance a 2 , of respective sizes n and m. That is, X\, . . . , X„ and Y\, . . . , Y m are 
independent samples from normal populations each having variance a 2 . Let S 2 
and S 2 denote the respective sample variances. Thus both S 2 and S 2 are unbiased 
estimators of a , Show by using the results of Example 7.7b along with the fact 



that 



Var( X 2 ) = 2k 



where xl ls chi-square with k degrees of freedom, that the minimum mean square 



estimator of a 2 of the form XS 2 + (1 — X)S 2 is 

2 _ (»- D^ 2 + (m- \)S 2 
P n + m — 2 

This is called the pooled estimator of a . 

61. Consider two estimators d\ and ^2 of a parameter 6. If E[d\\ = 9, 
Va.r{di) = 6 and E[dj] =9 + 2, Var(^2) = 2, which estimator should be 
preferred? 

62. Suppose that the number of accidents occurring daily in a certain plant has a 
Poisson distribution with an unknown mean A. Based on previous experience 
in similar industrial plants, suppose that a statistician's initial feelings about the 
possible value of A. can be expressed by an exponential distribution with parameter 
1. That is, the prior density is 

p(X) = e~ x , < A < 00 

Determine the Bayes estimate of A if there are a total of 83 accidents over the next 
10 days. What is the maximum likelihood estimate? 

63. The functional lifetimes in hours of computer chips produced by a certain 
semiconductor firm are exponentially distributed with mean 1/A. Suppose that 
the prior distribution on A. is the gamma distribution with density function 

, , e~ x x 2 
g(x) = , < x < 00 
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If the average life of the first 20 chips tested is 4.6 hours, compute the Bayes 
estimate of A. 

64. Each item produced will, independently, be defective with probability p. If the 
prior distribution on p is uniform on (0, 1), compute the posterior probability 
that p is less than .2 given 

(a) a total of 2 defectives out of a sample of size 10; 

(b) a total of 1 defective out of a sample of size 1 0; 

(c) a total of 10 defectives out of a sample of size 10. 

65. The breaking strength of a certain type of cloth is to be measured for 10 spec- 
imens. The underlying distribution is normal with unknown mean 9 but with 
a standard deviation equal to 3 psi. Suppose also that based on previous experience 
we feel that the unknown mean has a prior distribution that is normally dis- 
tributed with mean 200 and standard deviation 2. If the average breaking strength 
of a sample of 20 specimens is 182 psi, determine a region that contains 9 with 
probability .95. 
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8.1 INTRODUCTION 

As in the previous chapter, let us suppose that a random sample from a population distri- 
bution, specified except for a vector of unknown parameters, is to be observed. However, 
rather than wishing to explicitly estimate the unknown parameters, let us now suppose 
that we are primarily concerned with using the resulting sample to test some particular 
hypothesis concerning them. As an illustration, suppose that a construction firm has just 
purchased a large supply of cables that have been guaranteed to have an average breaking 
strength of at least 7,000 psi. To verify this claim, the firm has decided to take a random 
sample of 10 of these cables to determine their breaking strengths. They will then use the 
result of this experiment to ascertain whether or not they accept the cable manufacturer's 
hypothesis that the population mean is at least 7,000 pounds per square inch. 

A statistical hypothesis is usually a statement about a set of parameters of a population 
distribution. It is called a hypothesis because it is not known whether or not it is true. 
A primary problem is to develop a procedure for determining whether or not the values 
of a random sample from this population are consistent with the hypothesis. For instance, 
consider a particular normally distributed population having an unknown mean value 6 
and known variance 1. The statement "(9 is less than 1" is a statistical hypothesis that 
we could try to test by observing a random sample from this population. If the random 
sample is deemed to be consistent with the hypothesis under consideration, we say that 
the hypothesis has been "accepted"; otherwise we say that it has been "rejected." 

Note that in accepting a given hypothesis we are not actually claiming that it is true but 
rather we are saying that the resulting data appear to be consistent with it. For instance, 
in the case of a normal (6, 1) population, if a resulting sample of size 10 has an average 
value of 1.25, then although such a result cannot be regarded as being evidence in favor 
of the hypothesis "6 < 1," it is not inconsistent with this hypothesis, which would thus 
be accepted. On the other hand, if the sample of size 10 has an average value of 3, then 
even though a sample value that large is possible when 6 < 1 , it is so unlikely that it seems 
inconsistent with this hypothesis, which would thus be rejected. 
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8.2 SIGNIFICANCE LEVELS 

Consider a population having distribution Fg, where 9 is unknown, and suppose we want 
to test a specific hypothesis about 9. We shall denote this hypothesis by Hq and call it 
the null hypothesis. For example, if Fg is a normal distribution function with mean 9 and 
variance equal to 1, then two possible null hypotheses about 9 are 

(a) H : 6 = 1 

(b) H Q : 6 < 1 

Thus the first of these hypotheses states that the population is normal with mean 1 and 
variance 1, whereas the second states that it is normal with variance 1 and a mean less than 
or equal to 1. Note that the null hypothesis in (a), when true, completely specifies the 
population distribution; whereas the null hypothesis in (b) does not. A hypothesis that, 
when true, completely specifies the population distribution is called a simple hypothesis; 
one that does not is called a composite hypothesis. 

Suppose now that in order to test a specific null hypothesis Hq, a population sample 
of size n — sayXi, . . . ,X n — is to be observed. Based on these n values, we must decide 
whether or not to accept Hq. A test for Hq can be specified by defining a region C in 
^-dimensional space with the proviso that the hypothesis is to be rejected if the random 
sample X\,...,X n turns out to lie in C and accepted otherwise. The region C is called the 
critical region. In other words, the statistical test determined by the critical region C is the 
one that 



and 



accepts Hq if {X\,Xi, . . . ,X n ) g C 



rejects Hq if (X\, . . . ,X n ) e C 



For instance, a common test of the hypothesis that 9, the mean of a normal population 
with variance 1, is equal to 1 has a critical region given by 



C = 



(X\,. . . ,X n ) : 



i-\ 



1.96 



(8.2.1) 



Thus, this test calls for rejection of the null hypothesis that 9 = 1 when the sample average 
differs from 1 by more than 1.96 divided by the square root of the sample size. 

It is important to note when developing a procedure for testing a given null hypothesis 
Hq that, in any test, two different types of errors can result. The first of these, called a type 
I error, is said to result if the test incorrectly calls for rejecting Hq when it is indeed correct. 
The second, called a type II error, results if the test calls for accepting Hq when it is false. 



8.3 Tests Concerning the Mean of a Normal Population 293 



Now, as was previously mentioned, the objective of a statistical test of Hq is not to explicitly 
determine whether or not Hq is true but rather to determine if its validity is consistent 
with the resultant data. Hence, with this objective it seems reasonable that Hq should only 
be rejected if the resultant data are very unlikely when Hq is true. The classical way of 
accomplishing this is to specify a value a and then require the test to have the property 
that whenever Hq is true its probability of being rejected is never greater than a. The value 
a, called the level of significance of the test, is usually set in advance, with commonly chosen 
values being a = .1, .05, .005. In other words, the classical approach to testing Hq is to fix 
a significance level a and then require that the test have the property that the probability 
of a type I error occurring can never be greater than a. 

Suppose now that we are interested in testing a certain hypothesis concerning 9, an 
unknown parameter of the population. Specifically, for a given set of parameter values w, 
suppose we are interested in testing 

Hq :9 e w 

A common approach to developing a test oi Hq, say at level of significance a, is to start by 
determining a point estimator of 9 — say d(X.). The hypothesis is then rejected if ^(X) is 
"far away" from the region w. However, to determine how "far away" it need be to justify 
rejection of Hq, we need to determine the probability distribution of d{X.) when Hq is 
true since this will usually enable us to determine the appropriate critical region so as to 
make the test have the required significance level a. For example, the test of the hypothesis 
that the mean of a normal (9, 1) population is equal to 1, given by Equation 8.2.1, calls 
for rejection when the point estimate of 9 — that is, the sample average — is farther than 
1.96/ ^/n away from 1. As we will see in the next section, the value 1.96/^/w was chosen 
to meet a level of significance of a = .05. 

8.3 TESTS CONCERNING THE MEAN OF A 
NORMAL POPULATION 

8.3. 1 Case of Known Variance 

Suppose that X\,. . . ,X„ is a sample of size n from a normal distribution having an 
unknown mean /x and a known variance a and suppose we are interested in testing 
the null hypothesis 

Hq : [i = /zo 

against the alternative hypothesis 

H\ : fi + Mo 

where \iq is some specified constant. 
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Since X = £)/=i X{ln is a natural point estimator of //, it seems reasonable to accept 
Hq ifX is not too far from fio- That is, the critical region of the test would be of the form 

C = {X 1 ,...,X H :\X-no\>c] (8.3.1) 

for some suitably chosen value c. 

If we desire that the test has significance level a, then we must determine the critical 
value c in Equation 8.3.1 that will make the type I error equal to a. That is, c must be 
such that 

P^{\X-fio\ >c]=a (8.3.2) 

where we write P^ to mean that the preceding probability is to be computed under the 
assumption that [i = Hq. However, when fi = fXo,X will be normally distributed with 
mean /xo and variance a In and so Z, defined by 

X - fig 
al^fn 

will have a standard normal distribution. Now Equation 8.3.2 is equivalent to 



or, equivalently, 



IP I Z > ^— \ - a 



where Z is a standard normal random variable. However, we know that 

P{Z > z a!2 } = all 

and so 

c«Jn 

= Z a /2 

a 

or 

ZallCf 



C 

s/n 

Thus, the significance level a test is to reject Hq if \X — fio\ > z a /2<j/y/n and accept 
otherwise; or, equivalently, to 

yjn — 
reject Hq if \X - /x | > Axil 

° (8.3.3) 

accept Hq if \X - /x \ < z a / 2 

a 
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FIGURE 8.1 



This can be pictorially represented as shown in Figure 8.1, where we have superim- 
posed the standard normal density function [which is the density of the test statistic 
yJn{X — [L§)lo when Hq is true]. 

EXAMPLE 8.3a It is known that if a signal of value [i is sent from location A, then the 
value received at location B is normally distributed with mean /x and standard deviation 2. 
That is, the random noise added to the signal is an TV (0, 4) random variable. There is 
reason for the people at location B to suspect that the signal value fj. ■ — 8 will be sent 
today. Test this hypothesis if the same signal value is independently sent five times and 
the average value received at location B is X = 9. 5. 

SOLUTION Suppose we are testing at the 5 percent level of significance. To begin, we 
compute the test statistic 

\fn — V5 

^-|*-moI = -V(1.5) = 1.68 
a 2 

Since this value is less than 2025 = 1-96, the hypothesis is accepted. In other words, the 
data are not inconsistent with the null hypothesis in the sense that a sample average as far 
from the value 8 as observed would be expected, when the true mean is 8, over 5 percent 
of the time. Note, however, that if a less stringent significance level were chosen — 
say a = .1 — then the null hypothesis would have been rejected. This follows since 
£.05 = 1.645, which is less than 1.68. Hence, if we would have chosen a test that had a 
10 percent chance of rejecting Hq when Hq was true, then the null hypothesis would have 
been rejected. 

The "correct" level of significance to use in a given situation depends on the individ- 
ual circumstances involved in that situation. For instance, if rejecting a null hypothesis 
Hq would result in large costs that would thus be lost if Hq were indeed true, then we 
might elect to be quite conservative and so choose a significance level of .05 or .01. Also, 
if we initially feel strongly that Hq was correct, then we would require very stringent data 
evidence to the contrary for us to reject Hq. (That is, we would set a very low significance 
level in this situation.) ■ 
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The test given by Equation 8.3.3 can be described as follows: For any observed value of 
the test statistic ~Jn\X — /xol/cr, call it v, the test calls for rejection of the null hypothesis 
if the probability that the test statistic would be as large as v when Hq is true is less than 
or equal to the significance level a. From this, it follows that we can determine whether 
or not to accept the null hypothesis by computing, first, the value of the test statistic and, 
second, the probability that a unit normal would (in absolute value) exceed that quantity. 
This probability — called the p-value of the test — gives the critical significance level 
in the sense that Hq will be accepted if the significance level a is less than the p-value 
and rejected if it is greater than or equal. 

In practice, the significance level is often not set in advance but rather the data are 
looked at to determine the resultant p-value. Sometimes, this critical significance level is 
clearly much larger than any we would want to use, and so the null hypothesis can be 
readily accepted. At other times the p-value is so small that it is clear that the hypothesis 
should be rejected. 

EXAMPLE 8.3b In Example 8.3a, suppose that the average of the 5 values received is 
X = 8.5. In this case, 

Jn — V5 

^-\X-no\ = ^— = -559 
a 4 

Since 



P{\Z\ > .559} = 2/>{Z> .559} 
= 2 x .288 = .576 

it follows that the p- value is .576 and thus the null hypothesis Hq that the signal sent 
has value 8 would be accepted at any significance level a < .576. Since we would clearly 
never want to test a null hypothesis using a significance level as large as .576, Hq would 
be accepted. 

On the other hand, if the average of the data values were 11.5, then the p-value of the 
test that the mean is equal to 8 would be 

P{\Z\ > 1.75V5} = P{\Z\ > 3.913} 
% .00005 

For such a small p-value, the hypothesis that the value 8 was sent is rejected. ■ 

We have not yet talked about the probability of a type II error — that is, the probability 
of accepting the null hypothesis when the true mean jx is unequal to /Xq. This probability 
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will depend on the value of fi, and so let us define /?(//) by 
P(h) = /^{acceptance of Hq] 

n \ X - flQ 



a/y/n 



Za/2 



= Pu 



-Z a /2 < l—F^ - z al2 



O/Jn 



The function /3(/z) is called the operating characteristic (or OC) curve and represents the 
probability that Hq will be accepted when the true mean is /x. 

To compute this probability, we use the fact that X is normal with mean h and variance 
a 2 /n and so 

X- ix 



Z 



olyfn 



■ A/"(0, 1) 



Hence, 



X — Ho 

PW = Pfi \ -Za/2 < n=r- < Z al2 



a/Jn 



— P H- \ ~ z a/2 

— P H \ ~ z al2 



M 



aly/n 

M 

crl^fn 



X - H0~ n 

< < Z a /2 



<z 



H 



al^jn 
Ho 



a/Jn 



A* 1 

r < Z al2 — 1= \ 



= P 
= 4> 



Mo ~M 

cr/^/n 

Ho - IX 



Z a /2 <Z< — — + Za/2 



al Jn 



■Z a /2 



<t> 



alyfn 
Hq - ix 

alyfn 



Z a /2 



(8.3.4) 



where <1> is the standard normal distribution function. 

For a fixed significance level a, the OC curve given by Equation 8.3.4 is symmetric 
about Ho an d indeed will depend on /x only through (^/nla)\H — Ho I- This curve with 
the abscissa changed from /x toi = (^/n/cr)\H — Ho\ is presented in Figure 8.2 when 
a = .05. 

EXAMPLE 8.3c For the problem presented in Example 8.3a, let us determine the probability 
of accepting the null hypothesis that h — 8 when the actual value sent is 10. To do so, 
we compute 

— (Ho - H) = — — x 2 = -V5 
o 2 
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rf=^-i"o l 



FIGURE 8.2 The OC curve for the two-sided normal test for significance level a = .05, 

As -z.025 — 1-96, the desired probability is, from Equation 8.3.4, 

0(-V5 + 1.96) - <D(-V5 - 1.96) 

= 1 - <D(^5 - 1.96) - [1 - $(^5+ 1.96)] 
= 0(4.196) -4>(.276) 
= .392 ■ 



REMARK 

The function 1 — /3(/x) is called the power-function of the test. Thus, for a given value /x, 
the power of the test is equal to the probability of rejection when fi is the true value. ■ 

The operating characteristic function is useful in determining how large the random 
sample need be to meet certain specifications concerning type II errors. For instance, 
suppose that we desire to determine the sample size n necessary to ensure that the probability 
of accepting Hq : fi = fiQ when the true mean is actually /xi is approximately /3. That is, 
we want n to be such that 

But from Equation 8.3.4, this is equivalent to 



/ V«(/xo ~ Ml) 



+ Z a /2 



O 



^/n{fio - Mi) 



Zal2 



(8.3.5) 



Although the foregoing cannot be analytically solved for n, a solution can be obtained by 
using the standard normal distribution table. In addition, an approximation for n can be 
derived from Equation 8.3.5 as follows. To start, suppose that n\ > fiQ. Then, because 
this implies that 



Mo ~ Ml 
alyfn 



Z a /2 



-Z a /2 
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it follows, since <t> is an increasing function, that 

* Y ° ~^ " **ll) ^ *(~««/2) = ^ < -Zal2) = P{Z > z al2 ] = a/2 

V aljn I 



Mo — Ml , 

* ;-p- - Zff/2 « 

a/y/n 



Hence, we can take 



and so from Equation 8.3.5 



flwgf ** r +z*nS (8.3.6) 

or, since 

/3 = P{Z > ^} = P{Z < -^} = 4>(-^) 
we obtain from Equation 8.3.6 that 

—zp ^ (Mo - Ml) hZa/2 

o 



(z a l2+Zp) 2 2 

n % 1^ £i-=- (8.3.7) 

(Mi - Mo) 

In fact, the same approximation would result when Ml < Mo (the details are left as an 
exercise) and so Equation 8.3.7 is in all cases a reasonable approximation to the sample 
size necessary to ensure that the type II error at the value m = Ml ls approximately equal 
to/?. 

EXAMPLE 8.3d For the problem of Example 8.3a, how many signals need be sent so that 
the .05 level test of Hq : fi = 8 has at least a 75 percent probability of rejection when 
M = 9.2? 

SOLUTION Since z.025 = l-96,z.25 = .67, the approximation 8.3.7 yields 

(1.96 + .67) 2 , 
»« - — ; — rr^ 4 = 19.21 
(1.2) 2 

Hence a sample of size 20 is needed. From Equation 8.3.4, we see that with n = 20 

/ 1.2V20 \ / 1.2V20 \ 

0(9.2) = <D ( — + 1.96 1 - <f> I 1.96 

= <D(-.723) - <D(-4.643) 
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« 1 - $(.723) 
«.235 

Therefore, if the message is sent 20 times, then there is a 76.5 percent chance that the 
null hypothesis /x = 8 will be rejected when the true mean is 9.2. ■ 

ONE-SIDED TESTS 

In testing the null hypothesis that ji = [1q, we have chosen a test that calls for rejection 
when X is far from fiQ. That is, a very small value of X or a very large value appears to 
make it unlikely that /x (which X is estimating) could equal jiq. However, what happens 
when the only alternative to [i being equal to fiQ is for /x to be greater than /xo? That is, 
what happens when the alternative hypothesis to Hq : /x = /xo is H\ : /x > /xo? Clearly, 
in this latter case we would not want to reject Hq when X is small (since a small X is more 
likely when Hq is true than when H\ is true). Thus, in testing 

Hq : fi = fio versus Hi : /x > /xo (8.3.8) 

we should reject Hq whenX, the point estimate of /xo, is much greater than /xo. That is, 
the critical region should be of the following form: 

C = {(X l ,...,X n ):X-fi >c} 

Since the probability of rejection should equal a when Hq is true (that is, when /x = /Xo), 
we require that c be such that 

P ll0 (X-HQ>c} = a (8.3.9) 

But since 

X - flQ 

ol*Jn 
has a standard normal distribution when Hq is true, Equation 8.3.9 is equivalent to 

„ cJn\ 



when Z is a standard normal. But since 

P{Z > z a } = 
we see that 



8.3 Tests Concerning the Mean of a Normal Population 301 



Hence, the test of the hypothesis 8.3.8 is to reject Ho if X — /xo > z a alyfn, and accept 
otherwise; or, equivalently, to 

accept Ho if (X — /zo) < z a 

° (8.3.10) 

reject Ho if (X — fio) > z a 

a 

This is called a one-sided critical region (since it calls for rejection only when X is large). 
Correspondingly, the hypothesis testing problem 

Ho '■ M = Mo 
H\ : n > itrj 

is called a one-sided testing problem (in contrast to the two-sided problem that results when 
the alternative hypothesis is H\ :/i^ /j,q). 

To compute the/>-value in the one-sided test, Equation 8.3.10, we first use the data 
to determine the value of the statistic *Jn{X — fio)/(?- The/>-value is then equal to the 
probability that a standard normal would be at least as large as this value. 

EXAMPLE 8.3e Suppose in Example 8.3a that we know in advance that the signal value is 
at least as large as 8. What can be concluded in this case? 

SOLUTION To see if the data are consistent with the hypothesis that the mean is 8, we test 

Ho : [i = 8 

against the one-sided alternative 

Hi : ii > 8 

The value of the test statistic is */n(X — /Lio)/cr = V5(9.5 — 8)/2 = 1.68, and the Rvalue 
is the probability that a standard normal would exceed 1.68, namely, 

Rvalue = 1 - <f>(1.68) = .0465 

Since the test would call for rejection at all significance levels greater than or equal to .0465, 
it would, for instance, reject the null hypothesis at the a = .05 level of significance. ■ 

The operating characteristic function of the one-sided test, Equation 8.3.10, 

P(fJ-) = P M {accepting H } 
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can be obtained as follows: 

[ V" 

\X - 11 no - n ■ I 

= P 1 / r - i r + Za \ 

= p\z<^PP+z a \, Z~Af (0,1) 
( a/y/n J 

where the last equation follows since *Jn{X — /j)/cr has a standard normal distribution. 
Hence we can write 

or ^ ^ (V** ~ I 1 , 

Since <t>, being a distribution function, is increasing in its argument, it follows that /3(/x) 
decreases in /x; which is intuitively pleasing since it certainly seems reasonable that the 
larger the true mean /x, the less likely it should be to conclude that /x < /xo. Also since 
3>(sa) = 1 — of, it follows that 

fi(Ho) =1-Q! 

The test given by Equation 8.3.10, which was designed to test Hq : \x — [Iq versus 
H\ : fi > fiQ can also be used to test, at level of significance a, the one-sided hypothesis 

#o : M < Mo 

versus 

Hi : ix > mo 

To verify that it remains a level a test, we need show that the probability of rejection is 
never greater than a when Ho is true. That is, we must verify that 

1 — /3 (/x) < a for all /x < /xo 



pXm) > 1 — a for all /x < /xo 

But it has previously been shown that for the test given by Equation 8 .3 . 1 0, /3 (/x) decreases 
in /x and /3(/xo) = 1 — a. This gives that 

/K/x) > /Kmo) = l—tt for all /x < /xo 

which shows that the test given by Equation 8.3. 10 remains a level a test for Ho : /x < /xo 
against the alternative hypothesis H\ : /x < /xq. 
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REMARK 

We can also test the one-sided hypothesis 

Hq : ijl = fio (or /x > /xo) versus H\ : /x < /xo 
at significance level a by 



in 



accepting Hq if (X — /xo) > —z a 

a 

rejecting Hq otherwise 

This test can alternatively be performed by first computing the value of the test statistic 
^/n(X — jJLo)la. The Rvalue would then equal the probability that a standard normal 
would be less than this value, and the hypothesis would be rejected at any significance level 
greater than or equal to this />-value. 

EXAMPLE 8.3f All cigarettes presently on the market have an average nicotine content of 
at least 1.6 mg per cigarette. A firm that produces cigarettes claims that it has discovered a 
new way to cure tobacco leaves that will result in the average nicotine content of a cigarette 
being less than 1.6 mg. To test this claim, a sample of 20 of the firm's cigarettes were 
analyzed. If it is known that the standard deviation of a cigarette's nicotine content is 
.8 mg, what conclusions can be drawn, at the 5 percent level of significance, if the average 
nicotine content of the 20 cigarettes is 1.54? 

Note: The above raises the question of how we would know in advance that the standard 
deviation is .8. One possibility is that the variation in a cigarette's nicotine content is due 
to variability in the amount of tobacco in each cigarette and not on the method of curing 
that is used. Hence, the standard deviation can be known from previous experience. 

SOLUTION We must first decide on the appropriate null hypothesis. As was previously 
noted, our approach to testing is not symmetric with respect to the null and the alternative 
hypotheses since we consider only tests having the property that their probability of reject- 
ing the null hypothesis when it is true will never exceed the significance level a. Thus, 
whereas rejection of the null hypothesis is a strong statement about the data not being 
consistent with this hypothesis, an analogous statement cannot be made when the null 
hypothesis is accepted. Hence, since in the preceding example we would like to endorse 
the producer's claims only when there is substantial evidence for it, we should take this 
claim as the alternative hypothesis. 
That is, we should test 

Hq : fi > 1.6 versus H\ : /x < 1.6 

Now, the value of the test statistic is 

yfnQC - /xo)/ct = V20Q.54 - 1.6)/.8 = -.336 
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and so the />-value is given by 

Rvalue = P{Z < -.336}, Z ~ N(0, 1) 
= .368 

Since this value is greater than .05, the foregoing data do not enable us to reject, at the .05 
percent level of significance, the hypothesis that the mean nicotine content exceeds 1.6 
mg. In other words, the evidence, although supporting the cigarette producer's claim, is 
not strong enough to prove that claim. ■ 

REMARKS 

(a) There is a direct analogy between confidence interval estimation and hypothesis testing. 
For instance, for a normal population having mean [i and known variance a , we have 
shown in Section 7.3 that a 100(1 — a) percent confidence interval for fi is given by 

a a 

IX e [ X - Za/2—=,X + Zall—pz 

where x is the observed sample mean. More formally, the preceding confidence interval 
statement is equivalent to 

P\ix& (x - z a/2 —^,X + z a/2 —f=j \ = 1 -a 

Hence, if /x = /Xo, then the probability that /xo will fall in the interval 

a — a \ 

X — z a /2—^,X + z a i2—^ ) 
In V^/ 

is 1 — a, implying that a significance level a test of Hq : /x = /xo versus H\ : /x ^ /xo is 
to reject Ho when 

/— a — a 

/xrj £ I X - z a / 2 —^,X + Zq,/ 2 — = 
\ -Jn *Jn 

Similarly, since a 100(1 — a) percent one-sided confidence interval for /x is given by 

[i e \x - z a —=,oo 

it follows that an a-level significance test of Hq : /x < /xo versus Hi : /x > iirj is to reject 
Hq when /zrj ^ {X — z a ol^pn, oo) — that is, when /Xo < X — z a alypn. 
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TABLE 8.1 X\,. . . ,X„ Is a Sample from aM([i,a 2 ) Population a 2 Is Known X — ^ Xjln 

i=\ 

Significance 
Hq Hi Test Statistic TS Level a Test p- Value if TS = t 

H — /Xo fi ^ (Iq -Jn{X — iiq)Io Reject if | TS\ > z a /2 2P{Z > \t\] 

M < Mo M > Mo -Jn{X — IIq)Io Reject if TS > z a P[Z > t] 

M > MO M < MO -Jn(X — Mo)/ ff Reject if TS < —z a P[Z < t] 

Z is a standard normal random variable. 



(b) A Remark on Robustness A test that performs well even when the underlying 
assumptions on which it is based are violated is said to be robust. For instance, the tests 
of Sections 8.3.1 and 8.3.1.1 were derived under the assumption that the underlying 
population distribution is normal with known variance a . However, in deriving these 
tests, this assumption was used only to conclude that X also has a normal distribution. 
But, by the central limit theorem, it follows that for a reasonably large sample size, X will 
approximately have a normal distribution no matter what the underlying distribution. Thus 
we can conclude that these tests will be relatively robust for any population distribution 
with variance a . 

Table 8.1 summarizes the tests of this subsection. 

8.3.2 Case of Unknown Variance: The £-Test 

Up to now we have supposed that the only unknown parameter of the normal population 
distribution is its mean. However, the more common situation is one where the mean fi 
and variance a are both unknown. Let us suppose this to be the case and again consider a 
test of the hypothesis that the mean is equal to some specified value Mo- That is, consider 
a test of 

//o : M = Mo 

versus the alternative 

H\ : m 7^ Mo 

It should be noted that the null hypothesis is not a simple hypothesis since it does not 
specify the value of a . 

As before, it seems reasonable to reject Hq when the sample mean X is far from Mo- 
However, how far away it need be to justify rejection will depend on the variance a 2 . 
Recall that when the value of a 2 was known, the test called for rejecting Hq when \X — Mo I 
exceeded z a /2<y/^/n or, equivalently, when 

X - Mo 
aly/n 
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Now when a is no longer known, it seems reasonable to estimate it by 



J2(^-x) 2 



s 2 = i=] 



and then to reject Hq when 



n — \ 



X - HO 



Slyfn 



is large. 

To determine how large a value of the statistic 



y/n{X - Ho) 



S 



to require for rejection, in order that the resulting test have significance level a, we must 
determine the probability distribution of this statistic when Hq is true. However, as shown 
in Section 6.5, the statistic T, defined by 



T 



yfn{X - no) 
S 



has, when h = /xq, a ^-distribution with n — 1 degrees of freedom. Hence, 

yfn{X - /x ) 



m 



tal2,n—\ 



s 



< t a n,n-\ \ — 1 — OC 



(8.3.11) 



where t a /2,n- 1 is the 100 a/2 upper percentile value of the ^-distribution with n—\ degrees 
of freedom. (That is, P{T n -\ > t a ii,„-\} = P{T n -\ < —t a n,n-\) = all when T n -\ 
has a ^-distribution with n—\ degrees of freedom.) From Equation 8.3.11 we see that the 
appropriate significance level a test of 



Hq : fi = ho 

is, when a is unknown, to 

accept Hq if 

reject Hq if 



versus H\ : [i 7^ Ho 



y/n{X - Ho) 
S 



yfn{X - ho) 
S 



< toe/2,n-l 



> tal2,n-\ 



3.3.12) 
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FIGURE 8.3 TAi? two-sided t-test. 



The test defined by Equation 8.3.12 is called a two-sided t-test. It is pictorially illustrated 
in Figure 8.3. 

If we let t denote the observed value of the test statistic T = *Jn{X — fJ.o)/S, then the 
/>-value of the test is the probability that | T\ would exceed \t\ when Ho is true. That is, 
the Rvalue is the probability that the absolute value of a ^-random variable with n — 1 
degrees of freedom would exceed \t\. The test then calls for rejection at all significance 
levels higher than the />-value and acceptance at all lower significance levels. 

Program 8.3.2 computes the value of the test statistic and the corresponding Rvalue. 
It can be applied both for one- and two-sided tests. (The one-sided material will be 
presented shortly.) 

EXAMPLE 8.3g Among a clinic's patients having blood cholesterol levels ranging in the 
medium to high range (at least 220 milliliters per deciliter of serum), volunteers were 
recruited to test a new drug designed to reduce blood cholesterol. A group of 50 volunteers 
was given the drug for 1 month and the changes in their blood cholesterol levels were 
noted. If the average change was a reduction of 14.8 with a sample standard deviation of 
6.4, what conclusions can be drawn? 

SOLUTION Let us start by testing the hypothesis that the change could be due solely to 
chance — that is, that the 50 changes constitute a normal sample with mean 0. Because 
the value of the ^-statistic used to test the hypothesis that a normal mean is equal to is 

T = jnXIS = ^50 14.8/6.4 = 16.352 

it is clear that we should reject the hypothesis that the changes were solely due to chance. 
Unfortunately, however, we are not justified at this point in concluding that the changes 
were due to the specific drug used and not to some other possibility. For instance, it is 
well known that any medication received by a patient (whether or not this medication is 
directly relevant to the patient's suffering) often leads to an improvement in the patient's 
condition — the so-called placebo effect. Also, another possibility that may need to be 
taken into account would be the weather conditions during the month of testing, for it is 
certainly conceivable that this affects blood cholesterol level. Indeed, it must be concluded 
that the foregoing was a very poorly designed experiment, for in order to test whether 
a specific treatment has an effect on a disease that may be affected by many things, we 
should try to design the experiment so as to neutralize all other possible causes. The 
accepted approach for accomplishing this is to divide the volunteers at random into two 
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groups — one group to receive the drug and the other to receive a placebo (that is, a tablet 
that looks and tastes like the actual drug but has no physiological effect). The volunteers 
should not be told whether they are in the actual or control group, and indeed it is best if 
even the clinicians do not have this information (the so-called double-blind test) so as not 
to allow their own biases to play a role. Since the two groups are chosen at random from 
among the volunteers, we can now hope that on average all factors affecting the two groups 
will be the same except that one received the actual drug and the other a placebo. Hence, 
any difference in performance between the groups can be attributed to the drug. ■ 

EXAMPLE 8.3h A public health official claims that the mean home water use is 350 gallons 
a day. To verify this claim, a study of 20 randomly selected homes was instigated with the 
result that the average daily water uses of these 20 homes were as follows: 

340 344 362 375 

356 386 354 364 

332 402 340 355 

362 322 372 324 

318 360 338 370 

Do the data contradict the official's claim? 

SOLUTION To determine if the data contradict the official's claim, we need to test 

Hq : fi = 350 versus H\ : fi ^ 350 

This can be accomplished by running Program 8.3.2 or, if it is incovenient to utilize, by 
noting first that the sample mean and sample standard deviation of the preceding data set 
are 

X = 353.8, 5 = 21.8478 

Thus, the value of the test statistic is 



„, V20 3.8 

T = —— = .7778 

21.8478 

Because this is less than £.05,19 = 1.730, the null hypothesis is accepted at the 10 percent 
level of significance. Indeed, the/>-value of the test data is 

Rvalue = P{\T l9 \ > .7778} = 2P{T l9 > .7778} = .4462 

indicating that the null hypothesis would be accepted at any reasonable significance level, 
and thus that the data are not inconsistent with the claim of the health official. ■ 

We can use a one-sided t-test to test the hypothesis 

H : fi = no (or H : h < /x ) 
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against the one-sided alternative 

H\ : /x > fjbo 
The significance level a test is to 



„ . c V"( x ~ Mo) ^ 

accept H it < t a>n -i 

TT - c \/n{X - Ho) 

reject H if > t a , n -\ 



3.3.13) 



If y/n{X — ixq)IS = v, then the />-value of the test is the probability that a ^-random 
variable with n — \ degrees of freedom would be at least as large as v. 
The significance level <x test of 

Ho : ix = jtio (or Hq : /x > no) 

versus the alternative 

H\ : ix < no 
is to 

„ ., </«(X - mo) 
accept // ii ^ > -t<x,n-\ 

TT - c yfn{X - ixq) 
reject Hq if < -t a , n -\ 

The />-value of this test is the probability that a ^-random variable with n — 1 degrees of 
freedom would be less than or equal to the observed value of *Jn{X — ixo)IS. 

EXAMPLE 8.3i The manufacturer of a new fiberglass tire claims that its average life will be 
at least 40,000 miles. To verify this claim a sample of 12 tires is tested, with their lifetimes 
(in 1,000s of miles) being as follows: 

Tire I 2 3 4 56 789 10 11 12 

Life 36.1 40.2 33.8 38.5 42 35.8 37 41 36.8 37.2 33 36 

Test the manufacturer's claim at the 5 percent level of significance. 

SOLUTION To determine whether the foregoing data are consistent with the hypothesis 
that the mean life is at least 40,000 miles, we will test 

Ho : IX > 40,000 versus H x : IX < 40,000 
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A computation gives that 

X = 37.2833, 5 = 2.7319 

and so the value of the test statistic is 

12(37.2833 - 40) 



T 



3.4448 



2.7319 

Since this is less than —£.05,11 = —1.796, the null hypothesis is rejected at the 5 percent 
level of significance. Indeed, the/>-value of the test data is 

Rvalue = P{T n < -3.4448} = P{T n > 3.4448} = .0028 

indicating that the manufacturer's claim would be rejected at any significance level greater 
than .003. ■ 

The preceding could also have been obtained by using Program 8.3.2, as illustrated in 
Figure 8.4. 



The p-value of the One-sample t-Test 



This program computes the p-value when testing that a normal 
population whose variance is unknown has mean equal to jj.q 



Sample size = 12 



Data Values 



Data value = 



36 



Add This Point To List 



Remove Selected Point From List 



35.8 


* 


37 




41 




36.8 




37.2 
33 




Kl£^l 


♦ 


Clear List 



Start 



Quit 



40 



Enter the value of ^ 
Is the alternative hypothesis Is the alternative that the mean 



® One-Sided 
O Two-Sided 



O Is greater than ^ 
® Is less than //q 



The value of the t-statistic is -3.4448 
The p-value is 0.0028 



FIGURE 8.4 
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EXAMPLE 8.3j In a single-server queueing system in which customers arrive according to a 
Poisson process, the long-run average queueing delay per customer depends on the service 
distribution through its mean and variance. Indeed, if [M is the mean service time, and a 
is the variance of a service time, then the average amount of time that a customer spends 
waiting in queue is given by 

M/x 2 + a 2 ) 
2(1 - Xfi) 

provided that Xfi < 1, where A, is the arrival rate. (The average delay is infinite if 
X/M > 1.) As can be seen by this formula, the average delay is quite large when /x is only 
slightly smaller than 1/A, where, since A is the arrival rate, 1/A is the average time between 
arrivals. 

Suppose that the owner of a service station will hire a second server if it can be shown 
that the average service time exceeds 8 minutes. The following data give the service times 
(in minutes) of 28 customers of this queueing system. Do they indicate that the mean 
service time is greater than 8 minutes? 

8.6,9.4, 5.0,4.4,3.7, 11.4, 10.0,7.6, 14.4, 12.2, 11.0, 14.4,9.3, 10.5, 
10.3, 7.7, 8.3, 6.4, 9.2, 5.7, 7.9, 9.4, 9.0, 13.3, 1 1.6, 10.0, 9.5, 6.6 

SOLUTION Let us use the preceding data to test the null hypothesis that the mean service 
time is less than or equal to 8 minutes. A small />-value will then be strong evidence 
that the mean service time is greater than 8 minutes. Running Program 8.3.2 on these 
data shows that the value of the test statistic is 2.257, with a resulting /(-value of .016. 
Such a small />-value is certainly strong evidence that the mean service time exceeds 
8 minutes. ■ 

Table 8.2 summarizes the tests of this subsection. 



TABLE 8.2 X\,...,X n Is a Sample from a Af(fJ.,o ) Population a Is Unknown X — ^ Xj/r, 

»=1 

S 2 = tw-XfKn-l) 

i=\ 

Test Significance p- Value if 

H Hi Statistic TS Level a Test TS= t 

Reject if | TS\ > t a/2 ,n-\ 2P{T n -\ > \t\) 

Reject if TS > t at „- 1 P{ T„- 1 > t) 

Reject if TS < —t a , n -\ P{T n -\ < t) 

T n _\ is a t-random variable with n — 1 degrees of freedom: P{ T n ^\ > t an _\\ — a. 



M = Mo 


M # M0 


Jn~(X- 


- Mo)/S 


M S M0 


M > M0 


Jn(X- 


- Mo)/5 


M > M0 


M < M0 


■Jh(X- 


- MoVS 
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8.4 TESTING THE EQUALITY OF MEANS OF TWO 
NORMAL POPULATIONS 

A common situation faced by a practicing engineer is one in which she must decide whether 
two different approaches lead to the same solution. Often such a situation can be modeled 
as a test of the hypothesis that two normal populations have the same mean value. 

8.4. 1 Case of Known Variances 

Suppose that X\ ,... ,X n and Y\,...,Y m are independent samples from normal populations 
having unknown means fi x and fly but known variances a^ and a . Let us consider the 
problem of testing the hypothesis 

HQ I fl X = fly 

versus the alternative 

H\ : fi x ^ fi y 

Since X is an estimate of fi x and Y of fly, it follows that X — Y can be used to estimate 
fix — fly Hence, because the null hypothesis can be written as H$ : fi x — fi y = 0, it seems 
reasonable to reject it when X — Y is far from zero. That is, the form of the test should 
be to 

reject Hq if \X — Y\ > c 

_ _ (8.4.1) 

accept Ho if \X - Y\ < c 

for some suitably chosen value c. 

To determine that value of c that would result in the test in Equations 8.4.1 having 
a significance level a, we need determine the distribution of X — Y when Hq is true. 
However, as was shown in Section 7.3.2, 



X - Y ~AA 



Ma 




which implies that 



X - Y - {fl X - fly) 




M{0, 1) (8.4.2) 



Hence, when Hq is true (and so fi x — fi y = 0), it follows that 

(X-Y) 
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has a standard normal distribution; and thus 



Ph 



X - Y 

-Z a /2 < i = < Za/2 




1 -a 



3.4.3) 



From Equation 8.4.3, we obtain that the significance level a test of Hq : fi x = \i y versus 
H\ : [i x ^ n y is 



accept Mq 



H* if 



\X-Y\ 



reject Mq 



H* if 



o£ln + a^lm 

\x-T\ 



< Z a / 2 



> Z a /2 



Program 8.4.1 will compute the value of the test statistic (X — Y) / 



a}ln + ajlm. 



EXAMPLE 8.4a Two new methods for producing a tire have been proposed. To ascertain 
which is superior, a tire manufacturer produces a sample of 10 tires using the first method 
and a sample of 8 using the second. The first set is to be road tested at location A and the 
second at location B. It is known from past experience that the lifetime of a tire that is 
road tested at one of these locations is normally distributed with a mean life due to the tire 
but with a variance due (for the most part) to the location. Specifically, it is known that 
the lifetimes of tires tested at location A are normal with standard deviation equal to 4,000 
kilometers, whereas those tested at location B are normal with a = 6,000 kilometers. If the 
manufacturer is interested in testing the hypothesis that there is no appreciable difference 
in the mean life of tires produced by either method, what conclusion should be drawn at 
the 5 percent level of significance if the resulting data are as given in Table 8.3? 

TABLE 8.3 Tire Lives in Units of 100 Kilometers 



Tires Tested at A 

61.1 

58.2 

62.3 

64 

59.7 

66.2 

57.8 

61.4 

62.2 

63.6 



Tires Tested at B 

62.2 
56.6 
66.4 
56.2 

57.4 
58.4 
57.6 
65.4 
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SOLUTION A simple computation (or the use of Program 8.4.1) shows that the value of 
the test statistic is .066. For such a small value of the test statistic (which has a standard 
normal distribution when Hq is true), it is clear that the null hypothesis is accepted. ■ 

It follows from Equation 8.4.1 that a test of the hypothesis Hq : fi x = [i y (or Hq : 
fJ-x < Mj) against the one-sided alternative H\ : fi x > fly would be to 



accept Mq 



H n if X - Y < 



\al °? 



y 



ol tf 



reject H if X - Y > z a J — + 

y n m 

8.4.2 Case of Unknown Variances 

Suppose again that X\, ...,X n and Y\, . . . ,Y m are independent samples from normal 
populations having respective parameters (fi x > a x) and (fly, err'), but now suppose that all 
four parameters are unknown. We will once again consider a test of 

Hq '■ i 1 * — P^y versus H\ : /x x ^ \x y 



To determine a significance level a test of Hq we will need to make the additional 

r 2 

y 



assumption that the unknown variances a^ and er are equal. Let a denote their 



/alue — that is, 



^ 2 ^ 2 ~2 

a = a x = a y 



As before, we would like to reject Hq when X — Y is "far" from zero. To determine 
how far from zero it need be, let 



H^i-xy 



si = t=l 



n — \ 

m 

JliY—Y) 2 



Si = l=] 



y m — 1 

denote the sample variances of the two samples. Then, as was shown in Section 7.3.2, 

X - Y - (H X - fly) 



Sj(\ln+\lm) 



tn+m—l 
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Area = a 




Area = a 



FIGURE 8.5 Density of a t-random variable with k degrees offreedo 



where S 2 , the pooled estimator of the common variance a , is given by 



s 2 - 



(» - l)S 2 + (m - l)S 2 



n + m — 2 
Hence, when H$ is true, and so \i x — jXy = 0, the statistic 



Sj{\ln+ Mm) 



has a ^-distribution with n+ m — 2 degrees of freedom. From this, it follows that we can 
test the hypothesis that [l x = /Xy as follows: 

accept #o if \T\ < t a /2,n+m-2 

reject // if \T\ > t a/2 , n+m -2 

where t a /2, n +m-2 is the 100 a/2 percentile point of a ^-random variable with n + m — 2 
degrees of freedom (see Figure 8.5). 

Alternatively, the test can be run by determining the /(-value. If T is observed to equal 
v, then the resulting /(-value of the test of Hq against H\ is given by 

/(-value = P{\T n+m _ 2 \ > \v\) 
= 2P{T n+m _ 2 > \v\) 

where T n+m _2 is a ^-random variable having n + m — 2 degrees of freedom. 
If we are interested in testing the one-sided hypothesis 

Ho : fi x < \iy versus H\ : fi x > fi y 

then Ho will be rejected at large values of T. Thus the significance level a test is to 

reject Hq if T > t a ,n+m-2 
not reject Hq otherwise 
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If the value of the test statistic T is v, then the p- value is given by 

^-value = P{T n+m -2 > v] 
Program 8.4.2 computes both the value of the test statistic and the corresponding /(-value. 

EXAM PLE 8.4b Twenty-two volunteers at a cold research institute caught a cold after having 
been exposed to various cold viruses. A random selection of 10 of these volunteers was 
given tablets containing 1 gram of vitamin C. These tablets were taken four times a day. 
The control group consisting of the other 12 volunteers was given placebo tablets that 
looked and tasted exactly the same as the vitamin C tablets. This was continued for each 
volunteer until a doctor, who did not know if the volunteer was receiving the vitamin C 
or the placebo tablets, decided that the volunteer was no longer suffering from the cold. 
The length of time the cold lasted was then recorded. 

At the end of this experiment, the following data resulted. 



Treated with Vitamin C Treated with Placebo 

5.5 6.5 

6.0 6.0 

7.0 8.5 

6.0 7.0 

7.5 6.5 

6.0 8.0 

7.5 7.5 

5.5 6.5 

7.0 7.5 

6.5 6.0 

8.5 
7.0 



Do the data listed prove that taking 4 grams daily of vitamin C reduces the mean length 
of time a cold lasts? At what level of significance? 

SOLUTION To prove the above hypothesis, we would need to reject the null hypothesis in 
a test of 

Ho : flp < ll c versus H\ : [ip > fi c 

where fi c is the mean time a cold lasts when the vitamin C tablets are taken and fip is 
the mean time when the placebo is taken. Assuming that the variance of the length of the 
cold is the same for the vitamin C patients and the placebo patients, we test the above by 
running Program 8.4.2. This yields the information shown in Figure 8.6. Thus Hq would 
be rejected at the 5 percent level of significance. 
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The p-value of the Two-sample t-Test 



List 1 Sample size = 10 



Data value = 



6.5 



Add This Point To List 1 



6 

7.5 

6 

7.5 

5.5 

7 



Remove Selected Point From List 1 



Clear List 1 



List 2 Sample size = 12 



Data value = 



Add This Point To List 2 



Remove Selected Point From List 2 



8 

7.5 

6.5 

7.5 

6 

8.5 


* 




* 



Clear List 2 



Is the alternative 
hypothesis 



Is the alternative 
that the mean 
of sample 1 



® One-Sided 
O Two-Sided 



Start 



O Is greater than 
® Is less than 



the mean 
of sample 2? 



Quit 



The value of the t-statistic is -1 .898695 
The p-value is 0.03607 



FIGURE 8.6 



Of course, if it were not convenient to run Program 8.4.2 then we could have performed 
the test by first computing the values of the statistics X, Y , S%, Si, and Si, where the X 
sample corresponds to those receiving vitamin C and the Y sample to those receiving 
a placebo. These computations would give the values 



X = 6.450, 



si 



.581, 



Y = 7.125 



Sj = .778 



Therefore, 



1 5 2 + H? 2 = .689 

20 x 20 y 
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and the value of the test statistic i 



is 



-.675 

TS = = -1.90 

V-689(l/10+ 1/12) 

Since £0.5,20 = 1-725, the null hypothesis is rejected at the 5 percent level of significance. 
That is, at the 5 percent level of significance the evidence is significant in establishing that 
vitamin C reduces the mean time that a cold persists. ■ 

EXAMPLE 8.4c Reconsider Example 8.4a, but now suppose that the population variances 
are unknown but equal. 

SOLUTION Using Program 8.4.2 yields that the value of the test statistic is 1.028, and the 
resulting /(-value is 

Rvalue = P{T l6 > 1.028} = .3192 

Thus, the null hypothesis is accepted at any significance level less than .3192 ■ 

8.4.3 Case of Unknown and Unequal Variances 

Let us now suppose that the population variances o^ and cr are not only unknown but 
also cannot be considered to be equal. In this situation, since S^ is the natural estimator 

of or: and Sz of or , it would seem reasonable to base our test of 

x y y 



Ho : \± x = fly versus H\ : \± x i~ fi y 



on the test statistic 



X-Y 



IAA 






However, the foregoing has a complicated distribution, which, even when Hq is true, 
depends on the unknown parameters, and thus cannot be generally employed. The one 
situation in which we can utilize the statistic of Equation 8.4.4 is when n and m are 
both large. In such a case, it can be shown that when Hq is true Equation 8.4.4 will 
have approximately a standard normal distribution. Hence, when n and m are large an 
approximate level a test of Hq : fi x = [x y versus H\ : fi x -(- fi y is to 

X-Y 

accept Hq if - z a / 2 < . < z a /2 

y n m 
reject otherwise 
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The problem of determining an exact level a test of the hypothesis that the means of 
two normal populations, having unknown and not necessarily equal variances, are equal is 
known as the Behrens-Fisher problem. There is no completely satisfactory solution known. 

Table 8.4 presents the two-sided tests of this section. 

TABLE 8.4 X\, . . . ,X n Is a Sample from a TV '(/zi.Oj ) Population; Y\, . . . , Y m Is a Sample from a J\I \fl 2 , a 2 ) 
Population 

The Two Population Samples Are Independent 

To Test 

Hp : Ml = M2 versus Hp : jj.\ ^ [Xg 

Assumption Test Statistic TS Significance Level a Test p- Value if TS = t 

o\ , a 2 known *~ Y Reject if | TS\ > z a/2 2P{Z >\t\] 

Jaf/n+o^lm 

a\ = a 2 i X = Reject if | TS\ > t a /2, n +m-2 2P{T„ +m - 2 > \t\] 



(»-l)5f+(w-l)5| 



n-\-m— 2 

x-Y 



n, m large , A ~ r Reject if | TS\ > Za/ 2 2P{Z >\t\} 

8.4.4 The Paired £-Test 

Suppose we are interested in determining whether the installation of a certain antipollution 
device will affect a car's mileage. To test this, a collection of n cars that do not have this 
device are gathered. Each car's mileage per gallon is then determined both before and after 
the device is installed. How can we test the hypothesis that the antipollution control has 
no effect on gas consumption? 

The data can be described by the n pairs (%•, Yj), i = 1, . . . , it, where X, is the gas 
consumption of the z'th car before installation of the pollution control device, and Yi of 
the same car after installation. It is important to note that, since each of the n cars will 
be inherently different, we cannot treat X\ , . . . , X n and Y\ , . . . , Y n as being independent 
samples. For example, if we know that X\ is large (say, 40 miles per gallon), we would 
certainly expect that Y\ would also probably be large. Thus, we cannot employ the earlier 
methods presented in this section. 

One way in which we can test the hypothesis that the antipollution device does not 
affect gas mileage is to let the data consist of each car's difference in gas mileage. That is, 
let Wi = Xi — Yi, i = 1, ...,«. Now, if there is no effect from the device, it should follow 
that the Wi would have mean 0. Hence, we can test the hypothesis of no effect by testing 

Hq : fi w = versus H\ : fX, w ^ 

where W\, . . . , W n are assumed to be a sample from a normal population having unknown 
mean \x w and unknown variance cr^. But the t-test described in Section 8.3.2 shows that 
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this can be tested by 

w 

accepting H if - t a n,„-x < *Jn— < f a /2,n-i 
rejecting Hq otherwise 

EXAMPLE 8.4d An industrial safety program was recently instituted in the computer chip 
industry. The average weekly loss (averaged over 1 month) in man-hours due to accidents 
in 10 similar plants both before and after the program are as follows: 



Plant Before After A - B 



1 


30.5 


23 


-7.5 


2 


18.5 


21 


2.5 


3 


24.5 


22 


-2.5 


4 


32 


28.5 


-3.5 


5 


16 


14.5 


-1.5 


6 


15 


15.5 


.5 


7 


23.5 


24.5 


1 


8 


25.5 


21 


-4.5 


9 


28 


23.5 


-4.5 


10 


18 


16.5 


-1.5 



Determine, at the 5 percent level of significance, whether the safety program has been 
proven to be effective. 

SOLUTION To determine this, we will test 

Hq : [La — fJ-B > versus H\ : [la — IJ-b < 

because this will enable us to see whether the null hypothesis that the safety program has 
not had a beneficial effect is a reasonable possibility. To test this, we run Program 8.3.2, 
which gives the value of the test statistic as —2.266, with 

/>-value = P{T q < -2.266} = .025 

Since the p-value is less than .05, the hypothesis that the safety program has not been 
effective is rejected and so we can conclude that its effectiveness has been established (at 
least for any significance level greater than .025). M 

Note that the paired-sample t-test can be used even though the samples are not 
independent and the population variances are unequal. 
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8.5 HYPOTHESIS TESTS CONCERNING THE VARIANCE 
OF A NORMAL POPULATION 

Let X\ , . . . ,X n denote a sample from a normal population having unknown mean [i and 
unknown variance a , and suppose we desire to test the hypothesis 



versus the alternative 



H :a 2 = a 2 



H x :a 2 + a 2 



for some specified value CTg . 

To obtain a test, recall (as was shown in Section 6.5) that (n — 1)5 la has a chi-square 
distribution with n — \ degrees of freedom. Hence, when Hq is true 



1)S 2 



O 2 



Y 2 
A H -1 



and so 



f 2 (» - D^ 2 1 

P H \ Xi- al2 ,n-l < — 2 < Xal2,n-\ | = l ~ « 



v 

Therefore, a significance level a test is to 



, (n - l)S 2 2 

accept H if Xi_a./2,„-i < ~ 2 - *ul%n-\ 



<y, 



reject Hq otherwise 



o 



The preceding test can be implemented by first computing the value of the test statistic 
(n — 1)S 2 /<Jq — call it c. Then compute the probability that a chi-square random variable 
with n — 1 degrees of freedom would be (a) less than and (b) greater than c. If either of 
these probabilities is less than all, then the hypothesis is rejected. In other words, the 
/>-value of the test data is 



Rvalue = 2mm{P{xl_ l < c], 1 - P{xl-i < c}) 

The quantity P{x„-\ < c} can be obtained from Program 5.8. LA. The />-value for 
a one-sided test is similarly obtained. 

EXAMPLE 8.5a A machine that automatically controls the amount of ribbon on a tape has 
recently been installed. This machine will be judged to be effective if the standard deviation 
a of the amount of ribbon on a tape is less than . 1 5 cm. If a sample of 20 tapes yields 
a sample variance of S 2 = .025 cm 2 , are we justified in concluding that the machine is 
ineffective? 
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SOLUTION We will test the hypothesis that the machine is effective, since a rejection of 
this hypothesis will then enable us to conclude that it is ineffective. Since we are thus 
interested in testing 

H : a 2 < .0225 versus H x : a 2 > .0225 

it follows that we would want to reject Hq when S is large. Hence, the Rvalue of the 
preceding test data is the probability that a chi-square random variable with 19 degrees of 
freedom would exceed the observed value of 19S 2 /.0225 = 19 x .025/.0225 = 21.111. 
That is, 

p-value = P{xf 9 > 21.111} 

= 1 - .6693 = .3307 from Program 5.8. LA 

Therefore, we must conclude that the observed value of S = .025 is not large enough 
to reasonably preclude the possibility that o < .0225, and so the null hypothesis is 
accepted. ■ 

8.5. 1 Testing for the Equality of Variances of Two 
Normal Populations 

LetXj , . . . , X„ and Y\, . . . , Y m denote independent samples from two normal populations 
having respective (unknown) parameters \x x , a 2 and fly, a 2 and consider a test of 



H o ■ °l = tf versus H x : a 2 ^ a 2 



If we let 



S 2 = 



S 2 = 



Hix.-x) 2 

n — \ 

m 

JliY-Y) 2 



y m — 1 

denote the sample variances, then as shown in Section 6.5, (n — ^)S 2 I(J 2 and (m — \)S 2 la 2 
are independent chi-square random variables with n — 1 and m — 1 degrees of freedom, 
respectively. Therefore, {S 2 1 a 2 ) I {S 2 1 a 2 ) has an ^-distribution with parameters n—\ and 
m — 1 . Hence, when H$ is true 



and ; 



xy ^ r n — \,m— 1 



PH {F\-al2,n-\,m-\ < ^ X /S < F a /2, n -l,m-l} = 1 
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Thus, a significance level a test of Hq against H\ is to 



accept Hq if Fi- a n,n-i,m-\ < S x /S y < F a / 2 , n -i 
reject Hq otherwise 



y 



m— 1 



The preceding test can be effected by first determining the value of the test statistic 
SpSy, say its value is v, and then computing P{F n —\ >m —\ < v] where F n -\ >m -\ is an 
^-random variable with parameters n — 1, m — 1. If this probability is either less than 
a/2 (which occurs when S x is significantly less than S~ ) or greater than 1 — all (which 

occurs when S% is significantly greater than Sr), then the hypothesis is rejected. In other 
words, the /(-value of the test data is 

/(-value = 2min(P{i 7 „_ 1)OT _ 1 < v), 1 — P{F n -\ ttH -x < v}) 

The test now calls for rejection whenever the significance level a is at least as large as the 
/>-value. 

EXAMPLE 8.5b There are two different choices of a catalyst to stimulate a certain chemical 
process. To test whether the variance of the yield is the same no matter which catalyst is 
used, a sample of 10 batches is produced using the first catalyst, and 12 using the second. 
If the resulting data is Sf = .14 and S\ = .28, can we reject, at the 5 percent level, the 
hypothesis of equal variance? 

SOLUTION Program 5.8.3, which computes the F cumulative distribution function, yields 
that 

^9,ii < -5} = -1539 
Hence, 

/(-value = 2min{. 1539, .8461} 

= .3074 

and so the hypothesis of equal variance cannot be rejected. ■ 

8.6 HYPOTHESIS TESTS IN BERNOULLI POPULATIONS 

The binomial distribution is frequently encountered in engineering problems. For 
a typical example, consider a production process that manufactures items that can be 
classified in one of two ways — either as acceptable or as defective. An assumption often 
made is that each item produced will, independently, be defective with probability/^ and 
so the number of defects in a sample of n items will thus have a binomial distribution with 
parameters (n,p). We will now consider a test of 

Hq : p < po versus H\ : p > po 

whereto is some specified value. 
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If we let X denote the number of defects in the sample of size n, then it is clear that 
we wish to reject Ho when X is large. To see how large it need be to justify rejection at the 
a level of significance, note that 

n n / \ 

P [x >k} = Y,nx = 1} = J2 (j/'a -p) n " 

Now it is certainly intuitive (and can be proven) that P{X > k\ is an increasing function 
of p — that is, the probability that the sample will contain at least k errors increases in the 
defect probability/*. Using this, we see that when Hq is true (and so p < po), 



i'l^^Efj^i-^r 



Hence, a significance level a test of Hq : p < po versus H\ : p > po is to reject Hq when 

X >k* 
where k* is the smallest value of k for which Yl"=k v)Po^ ~ po) n ~' < &• That is, 



k = mir 



E(^)/>o(i-/>or-'<c4 



This test can best be performed by first determining the value of the test statistic 
say, X = x — and then computing the />-value given by 



p-value = P{B{n,po) > x] 



EXAMPLE 8.6a A computer chip manufacturer claims that no more than 2 percent of the 
chips it sends out are defective. An electronics company, impressed with this claim, has 
purchased a large quantity of such chips. To determine if the manufacturer's claim can be 
taken literally, the company has decided to test a sample of 300 of these chips. If 10 of 
these 300 chips are found to be defective, should the manufacturer's claim be rejected? 

SOLUTION Let us test the claim at the 5 percent level of significance. To see if rejection 
is called for, we need to compute the probability that the sample of size 300 would 
have resulted in 10 or more defectives when/) is equal to .02. (That is, we compute the 
/>-value.) If this probability is less than or equal to .05, then the manufacturer's claim 
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should be rejected. Now 

Pm{X > 10} = 1 - P m {X < 10} 

9 



i-E(T) C02yC98)3 



\300-i 

.0818 from Program 3.1 



and so the manufacturer's claim cannot be rej ected at the 5 percent level of significance. ■ 

When the sample size n is large, we can derive an approximate significance level a test 
of Hq : p < po versus H\ : p > po by using the normal approximation to the binomial. It 
works as follows: Because when n is large X will have approximately a normal distribution 
with mean and variance 



E[X] = np, VarpO = np{\ - p) 

it follows that 

X — np 



■y/np(l -p) 

will have approximately a standard normal distribution. Therefore, an approximate 
significance level a test would be to reject Hq if 

X - npo 



jnpoil -p ) 

Equivalently, one can use the normal approximation to approximate the /(-value. 

EXAMPLE 8.6b In Example 8.6a, np = 300(.02) = 6, and ^fnp (l - p ) = V5.88. 
Consequently, the/>-value that results from the data X = 10 is 

Rvalue = P m {X > 10} 
= Pm{X > 9.5} 

X-6 9.5-6 



= P 



02 



75^88 vy. 

P{Z > 1.443} 
.0745 ■ 
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Suppose now that we want to test the null hypothesis that p is equal to some specified 
value; that is, we want to test 

Ho : p — po versus H\ : p ^ po 

If X, a binomial random variable with parameters n and/), is observed to equal x, then 
a significance level a test would reject Ho if the value x was either significantly larger or 
significantly smaller than what would be expected when/) is equal to po- More precisely, 
the test would reject Hq if either 

P{R'm{n,po) > x] < a/1 or P{Wm{n,po) < x} < a/1 

In other words, the /(-value when X = x is 

/(-value = 2min(P{Bin(?2,/>o) > x},P{B'm(n,po) < x}) 

EXAMPLE 8.6c Historical data indicate that 4 percent of the components produced at 
a certain manufacturing facility are defective. A particularly acrimonious labor dispute has 
recently been concluded, and management is curious about whether it will result in any 
change in this figure of 4 percent. If a random sample of 500 items indicated 16 defectives 
(3.2 percent), is this significant evidence, at the 5 percent level of significance, to conclude 
that a change has occurred? 

SOLUTION To be able to conclude that a change has occurred, the data need to be strong 
enough to reject the null hypothesis when we are testing 

Ho : p = .04 versus H\ : p ^ .04 

where/> is the probability that an item is defective. The Rvalue of the observed data of 16 
defectives in 500 items is 

/(-value = lmm{P{X < 16},P{X > 16}} 

where X is a binomial (500, .04) random variable. Since 500 x .04 = 20, we see that 

/>-value = 1P{X < 16} 

Since X has mean 20 and standard deviation V20(.96) = 4.38, it is clear that twice the 
probability that X will be less than or equal to 1 6 — a value less than one standard deviation 
lower than the mean — is not going to be small enough to justify rejection. Indeed, it can 
be shown that 

/(-value = 1P{X < 16} = .432 

and so there is not sufficient evidence to reject the hypothesis that the probability of 
a defective item has remained unchanged. ■ 
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8.6. 1 Testing the Equality of Parameters in Two 
Bernoulli Populations 

Suppose there are two distinct methods for producing a certain type of transistor; and 
suppose that transistors produced by the first method will, independently, be defective 
with probability p\, with the corresponding probability beings for those produced by 
the second method. To test the hypothesis that p\ = p%, a sample of n\ transistors is 
produced using method 1 and »2 using method 2. 

LetXi denote the number of defective transistors obtained from the first sample andX2 
for the second. Thus, X\ and X2 are independent binomial random variables with respective 
parameters («i,/>i) and (»2»/>2)- Suppose thatXi + X2 = k and so there have been a total 
of k defectives. Now, if Hq is true, then each of the n\ + »2 transistors produced will have 
the same probability of being defective, and so the determination of the k defectives will 
have the same distribution as a random selection of a sample of size k from a population 
of n\ + »2 items of which n\ are white and »2 are black. In other words, given a total of 
k defectives, the conditional distribution of the number of defective transistors obtained 
from method 1 will, when H$ is true, have the following hypergeometric distribution*: 



n\\ « 2 



P Ha {X l =i\X l +X 2 = k}= AiZA lt- t i = 0,l,...,k (8.6.1) 

'n\ + »2 N 

k 



Now, in testing 

H(,:p\=p2 versus H\:p\£pi 

it seems reasonable to reject the null hypothesis when the proportion of defective transistors 
produced by method 1 is much different than the proportion of defectives obtained under 
method 2. Therefore, if there is a total of k defectives, then we would expect, when Hq 
is true, thatXi/«i (the proportion of defective transistors produced by method 1) would 
be close to (k — X\)ln2 (the proportion of defective transistors produced by method 2). 
Because X\ln\ and (k — X\)lri2 will be farthest apart when X\ is either very small or very 
large, it thus seems that a reasonable significance level a test of Equation 8.6. 1 is as follows. 
If X\ + X2 = k, then one should 

reject Hq if either P{X < x\] < a/2 or P{X > x\] < all 
accept Hq otherwise 



* See Example 5.3b for a formal verification of Equation 8.6.1. 
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where X is a hypergeometric random variable with probability mass function 

n\\ ( n% 



P{X = i\ = X '/ X ^- i = 0, 1, . . . , k (8.6.2) 

' n\ + n 2 



In other words, this test will call for rejection if the significance level is at least as large as 
the p-vaiue given by 

Rvalue = 2 mm(P{X < x\ }, P{X > x\ }) (8.6.3) 

This is called the Fisher-Irwin test. 

COMPUTATIONS FOR THE FISHER-IRWIN TEST 

To utilize the Fisher-Irwin test, we need to be able to compute the hypergeometric 
distribution function. To do so, note that withX having mass function Equation 8.6.2, 



n\ \l n 2 
P[X = i + 1 ! V * + 1/ U - i ~ 



(8.6A 



P{X = i) /»A/ n 2 \ 

= {m ~ l)ik ~ l) (8.6.5) 

(i+l)(n 2 -k + i+l) 

where the verification of the final equality is left as an exercise. 

Program 8.6.1 uses the preceding identity to compute the/>-value of the data for the 
Fisher-Irwin test of the equality of two Bernoulli probabilities. The program will work 
best if the Bernoulli outcome that is called unsuccessful (or defective) is the one whose 
probability is less than .5. For instance, if over half the items produced are defective, then 
rather than testing that the defect probability is the same in both samples, one should test 
that the probability of producing an acceptable item is the same in both samples. 

EXAMPLE 8.6d Suppose that method 1 resulted in 20 unacceptable transistors out of 100 
produced; whereas method 2 resulted in 12 unacceptable transistors out of 100 produced. 
Can we conclude from this, at the 1 percent level of significance, that the two methods 
are equivalent? 

SOLUTION Upon running Program 8.6.1, we obtain that 

/>-value = .1763 
Hence, the hypothesis that the two methods are equivalent cannot be rejected. ■ 
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The ideal way to test the hypothesis that the results of two different treatments are 
identical is to randomly divide a group of people into a set that will receive the first 
treatment and one that will receive the second. However, such randomization is not 
always possible. For instance, if we want to study whether drinking alcohol increases the 
risk of prostate cancer, we cannot instruct a randomly chosen sample to drink alcohol. 
An alternative way to study the hypothesis is to use an observational study that begins by 
randomly choosing a set of drinkers and one of nondrinkers. These sets are followed for 
a period of time and the resulting data is then used to test the hypothesis that members of 
the two groups have the same risk for prostate cancer. 

Our next sample illustrates another way of performing an observational study. 

EXAMPLE 8.6e In 1970, the researchers Herbst, Ulfelder, and Poskanzer (H-U-P) sus- 
pected that vaginal cancer in young women, a rather rare disease, might be caused by 
one's mother having taken the drug diethylstilbestrol (usually referred to as DES) while 
pregnant. To study this possibility, the researchers could have performed an observa- 
tional study by searching for a (treatment) group of women whose mothers took DES 
when pregnant and a (control) group of women whose mothers did not. They could then 
observe these groups for a period of time and use the resulting data to test the hypoth- 
esis that the probabilities of contracting vaginal cancer are the same for both groups. 
However, because vaginal cancer is so rare (in both groups) such a study would require 
a large number of individuals in both groups and would probably have to continue for 
many years to obtain significant results. Consequently, H-U-P decided on a different 
type of observational study. They uncovered 8 women between the ages of 1 5 and 22 
who had vaginal cancer. Each of these women (called cases) was then matched with 4 
others, called referents or controls. Each of the referents of a case was free of the cancer 
and was born within 5 days in the same hospital and in the same type of room (either 
private or public) as the case. Arguing that if DES had no effect on vaginal cancer then the 
probability, call \tp c , that the mother of a case took DES would be the same as the prob- 
ability, call itp r , that the mother of a referent took DES, the researchers H-U-P decided 
to test 

H : p c = pr against H\ : p c ^ p r 

Discovering that 7 of the 8 cases had mothers who took DES while pregnant, while 
none of the 32 referents had mothers who took the drug, the researchers (see Herbst, A., 
Ulfelder, H., and Poskanzer, D., "Adenocarcinoma of the Vagina: Association of Maternal 
Stilbestrol Therapy with Tumor Appearance in Young Women," New England Journal of 
Medicine, 284, 878-881, 1971) concluded that there was a strong association between 
DES and vaginal cancer. (The/>-value for these data is approximately 0.) ■ 

When n\ and »2 are large, an approximate level a test of Hq : p\ = pj, based on the 
normal approximation to the binomial, is outlined in Problem 63. 
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8.7 TESTS CONCERNING THE MEAN OF A 
POISSON DISTRIBUTION 

Let X denote a Poisson random variable having mean X and consider a test of 

Hq : X = Xq versus H\ : X ^ Xq 

If the observed value of X is X = x, then a level a test would reject Hq if either 

P\ {X>x}<al2 or P Xo {X < x} < a/2 (8.7.1) 

where /\ means that the probability is computed under the assumption that the Poisson 
mean is Xq. It follows from Equation 8.7.1 that the Rvalue is given by 

Rvalue = 2m'm(P ko {X > x},P ko {X < x}) 

The calculation of the preceding probabilities that a Poisson random variable with mean 
Xq is greater (less) than or equal to x can be obtained by using Program 5.2. 

EXAMPLE 8.7a Management's claim that the mean number of defective computer chips 
produced daily is not greater than 25 is in dispute. Test this hypothesis, at the 5 percent 
level of significance, if a sample of 5 days revealed 28, 34, 32, 38, and 22 defective chips. 

SOLUTION Because each individual computer chip has a very small chance of being defec- 
tive, it is probably reasonable to suppose that the daily number of defective chips is 
approximately a Poisson random variable, with mean, say, X. To see whether or not 
the manufacturer's claim is credible, we shall test the hypothesis 

Hq : X < 25 versus H\ : X > 25 

Now, under Hq, the total number of defective chips produced over a 5-day period is 
Poisson distributed (since the sum of independent Poisson random variables is Poisson) 
with a mean no greater than 125. Since this number is equal to 154, it follows that the 
p-vaiue of the data is given by 

Rvalue = Pm{X > 154} 

= l-P l25 {X < 153} 

= .0066 from Program 5.2 

Therefore, the manufacture's claim is rejected at the 5 percent (as it would be even at the 
1 percent) level of significance. I 
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REMARK 

If Program 5.2 is not available, one can use the fact that a Poisson random variable 
with mean X is, for large X approximately normally distributed with a mean and variance 
equal to X. 

8.7. 1 Testing the Relationship Between Two Poisson Parameters 

LetXi andJ^ be independent Poisson random variables with respective means X\ and k%, 
and consider a test of 

Hq : X 2 = ck\ versus H\ : k 2 ^ cX\ 

for a given constant c. Our test of this is a conditional test (similar in spirit to the Fisher- 
Irwin test of Section 8.6.1), which is based on the fact that the conditional distribution 
of X\ given the sum of X\ and X2 is binomial. More specifically, we have the following 
proposition. 

PROPOSITION 8.7.1 

P{X, = k\Xy + X 2 = n} = ("Xki/(ki + X 2 )f[X 2 l{X l + X 2 )]"- k 



Proof 



P{X X = k\X x + X 2 = n\ 

P{X x =k,X x +X 2 



P{X X +X 2 -- 


= n\ 


P{X x =k,X 2 = 


n — k} 


P{X+x 2 = 


■ n\ 


P{X X = k}P{X 2 


= n — k} 



P{X l +X 2 = n} 



by independence 



exp{-ki}k k 1 /k\cxp{-k 2 }k^ k /(n - k)\ 

exp{-(X r + X 2 )}(Xi + X 2 )"ln\ 

\x l i{x l + x 2 )f\x 2 i{x l +x 1 )T- k □ 



{n-k)\k\ 



It follows from Proposition 8.7. 1 that, if Hq is true, then the conditional distribution of 
X\ given that X\ +X 2 = n is the binomial distribution with parameters n and/) = l/(l+c). 
From this we can conclude that if X\ +X 2 = n, then Hq should be rejected if the observed 
value of X\, call it x\, is such that either 

7 5 {Bin(«, 1/(1 + c)) >xi} < all 
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or 



P{Bin(n, 1/(1 + c)) < xi } < a/2 



EXAMPLE 8.7b An industrial concern runs two large plants. If the number of accidents 
during the last 8 weeks at plant 1 were 16, 18, 9, 22, 17, 19, 24, 8 while the number of 
accidents during the last 6 weeks at plant 2 were 22, 18, 26, 30, 25, 28, can we conclude, 
at the 5 percent level of significance, that the safety conditions differ from plant to plant? 

SOLUTION Since there is a small probability of an industrial accident in any given minute, 
it would seem that the weekly number of such accidents should have approximately a 
Poisson distribution. If we \etX\ denote the total number of accidents during an 8-week 
period at plant 1 , and let X2 be the number during a 6- week period at plant 2, then if the 
safety conditions did not differ at the two plants we would have that 

^2 = 4^1 

where A., = E[X{], i = 1,2. Hence, asX\ = 133,^2 = 149 it follows that the/>-value of 
the test of 

Ho : A2 = f M versus H\ : \% ^ \~k\ 

is given by 

/(-value = 2min(7 5 {Bin(282, |) > 133},P{Bin(282, |) < 133}) 
= 9.408 x 10" 4 

Thus, the hypothesis that the safety conditions at the two plants are equivalent is 
rejected. ■ 

EXAMPLE 8.7c In an attempt to show that proofreader A is superior to proofreader B, both 
proofreaders were given the same manuscript to read. If proofreader A found 28 errors, 
and proofreader B found 18, with 10 of these errors being found by both, can we conclude 
that A is the superior proofreader? 

SOLUTION To begin, we need a model. So let us assume that each manuscript error is 
independently found by proofreader A with probability Pa and by proofreader B with 
probability Pg- To see if the data prove that A is the superior proofreader, we need to 
check if it would lead to rejecting the hypothesis that B is at least as good. That is, we need 
to test the null hypothesis 

H :P A < Pb 
against the alternative hypothesis 

H 1 :P A > P B 
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To determine a test, note that each error can be classified as being of one of 4 types: it 
is type 1 if it is found by both proofreaders; it is type 2 if found by A but not by B; it is 
type 3 if found by B but not by A; and it is type 4 if found by neither. Thus, under our 
independence assumptions, it follows that each error will independently be type i with 
probability pi, where 

p\ = PaPb, pi = PaO ~ Pb), #» = (!- Pa)Pb, Pa = (1 - Pa)0 ~ Pb) 

Now, if we do our analysis under the assumption that N, the total number of errors in the 
manuscript, is a random variable that is Poisson distributed with some unknown mean A., 
then it follows from the results of Section 5.2 that the numbers of errors of types 1, 2, 3, 
4 are independent Poisson random variables with respective means Xp\, Xp2, A.^3, A.^4. 
Now, because y^ = J _ t is an increasing function of x in the region < x < 1, 

Pa>Pb& t^V > t^t & PaO- - Pb) >0~ Pa)Pb 
1 - Pa 1 - Pb 

In other words, Pa > Pb if and only if/>2 > p$. As a result, it suffices to use the data to 
test 

Ho '■ pi < p3 versus H\ : p2 > p3 

Therefore, with N2 denoting the number of errors of type 2 (that is, the number of errors 
found by A but not by B), and N3 the number of errors of type 3 (that is, the number 
found by B but not by A), it follows that we need to test 

Hq : E[N 2 ] < E[N 3 ] versus H x : E[N 2 ] > E[N 5 ] (8.7.2) 

where N2 and N3 are independent Poisson random variables. Now, by Proposition 8.7.1, 
the conditional distribution of N2 given N2 + N$ is binomial (n,p) where n = N2 + N$ 
and/< = (E[N2])/(E[N2] + E[N$\). Because Equation 8.7.2 is equivalent to 

Hq : p < 1/2 versus H\ : p > 1/2 

it follows that the/>-value that results when N2 = n 2 , N3 = n$ is 

p-value = f{Bin(«2 + «3> -5) > ^2} 

For the data given, n% = 18, W3 = 8, yielding that 

Rvalue = 7 5 {Bin(26, .5) > 18} = .0378 

Consequently, at the 5 percent level of significance, the null hypothesis is rejected leading 
to the conclusion that A is the superior proofreader. ■ 
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Problems 

1. Consider a trial in which a jury must decide between the hypothesis that the 
defendant is guilty and the hypothesis that he or she is innocent. 

(a) In the framework of hypothesis testing and the U.S. legal system, which of 
the hypotheses should be the null hypothesis? 

(b) What do you think would be an appropriate significance level in this situation? 

2. A colony of laboratory mice consists of several thousand mice. The average 
weight of all the mice is 32 grams with a standard deviation of 4 grams. A 
laboratory assistant was asked by a scientist to select 25 mice for an experi- 
ment. However, before performing the experiment the scientist decided to weigh 
the mice as an indicator of whether the assistant's selection constituted a ran- 
dom sample or whether it was made with some unconscious bias (perhaps the 
mice selected were the ones that were slowest in avoiding the assistant, which 
might indicate some inferiority about this group). If the sample mean of the 
25 mice was 30.4, would this be significant evidence, at the 5 percent level 
of significance, against the hypothesis that the selection constituted a random 
sample? 

3. A population distribution is known to have standard deviation 20. Determine the 
/>-value of a test of the hypothesis that the population mean is equal to 50, if the 
average of a sample of 64 observations is 

(a) 52.5; (b) 55.0; (c) 57.5. 

4. In a certain chemical process, it is very important that a particular solution that 
is to be used as a reactant have a pH of exactly 8.20. A method for determining 
pH that is available for solutions of this type is known to give measurements that 
are normally distributed with a mean equal to the actual pH and with a standard 
deviation of .02. Suppose 10 independent measurements yielded the following 
pH values: 

8.18 8.17 

8.16 8.15 

8.17 8.21 
8.22 8.16 

8.19 8.18 

(a) What conclusion can be drawn at the a = .10 level of significance? 

(b) What about at the a = .05 level of significance? 

5. The mean breaking strength of a certain type of fiber is required to be at least 
200 psi. Past experience indicates that the standard deviation of breaking strength 
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is 5 psi. If a sample of 8 pieces of fiber yielded breakage at the following pressures, 



210 


198 


195 


202 


197.4 


196 


199 


195.5 



would you conclude, at the 5 percent level of significance, that the fiber is 
unacceptable? What about at the 10 percent level of significance? 

It is known that the average height of a man residing in the United States is 5 feet 
10 inches and the standard deviation is 3 inches. To test the hypothesis that men 
in your city are "average," a sample of 20 men have been chosen. The heights of 
the men in the sample follow: 



Man 


Height in 


Inches 


Man 


1 


72 


70.4 


11 


1 


68.1 


76 


12 


3 


69.2 


72.5 


13 


4 


72.8 


74 


14 


5 


71.2 


71.8 


15 


6 


72.2 


69.6 


16 


7 


70.8 


75.6 


17 


8 


74 


70.6 


18 


9 


66 


76.2 


19 


10 


70.3 


77 


20 



What do you conclude? Explain what assumptions you are making. 

7. Suppose in Problem 4 that we wished to design a test so that if the pH were really 
equal to 8.20, then this conclusion will be reached with probability equal to .95. 
On the other hand, if the pH differs from 8.20 by .03 (in either direction), we 
want the probability of picking up such a difference to exceed .95. 

(a) What test procedure should be used? 

(b) What is the required sample size? 

(c) If x = 8.3 1, what is your conclusion? 

(d) If the actual pH is 8.32, what is the probability of concluding that the pH is 
not 8.20, using the foregoing procedure? 

8. Verify that the approximation in Equation 8.3.7 remains valid even when 
Ml < Mo- 

9. A British pharmaceutical company, Glaxo Holdings, has recently developed a new 
drug for migraine headaches. Among the claims Glaxo made for its drug, called 
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somatriptan, was that the mean time it takes for it to enter the bloodstream is less 
than 10 minutes. To convince the Food and Drug Administration of the validity of 
this claim, Glaxo conducted an experiment on a randomly chosen set of migraine 
sufferers. To prove its claim, what should they have taken as the null and what as 
the alternative hypothesis? 

10. The weights of salmon grown at a commercial hatchery are normally distributed 
with a standard deviation of 1.2 pounds. The hatchery claims that the mean 
weight of this year's crop is at least 7.6 pounds. Suppose a random sample of 16 
fish yielded an average weight of 7.2 pounds. Is this strong enough evidence to 
reject the hatchery's claims at the 

(a) 5 percent level of significance; 

(b) 1 percent level of significance? 

(c) What is the/>-value? 

11. Consider a test of Hq : /x < 100 versus H\ : fi > 100. Suppose that a sample of 
size 20 has a sample mean ofX = 105. Determine the p-value of this outcome if 
the population standard deviation is known to equal 

(a) 5; (b) 10; (c) 15. 

12. An advertisement for a new toothpaste claims that it reduces cavities of children in 
their cavity-prone years. Cavities per year for this age group are normal with mean 
3 and standard deviation 1. A study of 2,500 children who used this toothpaste 
found an average of 2.95 cavities per child. Assume that the standard deviation of 
the number of cavities of a child using this new toothpaste remains equal to 1. 

(a) Are these data strong enough, at the 5 percent level of significance, to establish 
the claim of the toothpaste advertisement? 

(b) Do the data convince you to switch to this new toothpaste? 

13. There is some variability in the amount of phenobarbitol in each capsule sold 
by a manufacturer. However, the manufacturer claims that the mean value is 
20.0 mg. To test this, a sample of 25 pills yielded a sample mean of 19.7 with 
a sample standard deviation of 1.3. What inference would you draw from these 
data? In particular, are the data strong enough evidence to discredit the claim of 
the manufacturer? Use the 5 percent level of significance. 

14. Twenty years ago, entering male high school students of Central High could do 
an average of 24 pushups in 60 seconds. To see whether this remains true today, 
a random sample of 36 freshmen was chosen. If their average was 22.5 with 
a sample standard deviation of 3.1, can we conclude that the mean is no longer 
equal to 24? Use the 5 percent level of significance. 

15. The mean response time of a species of pigs to a stimulus is .8 seconds. Twenty- 
eight pigs were given 2 oz of alcohol and then tested. If their average response time 
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was 1.0 seconds with a standard deviation of .3 seconds, can we conclude that 
alcohol affects the mean response time? Use the 5 percent level of significance. 

16. Suppose that team A and team B are to play a National Football League game and 
that team A is favored by/ points. Let S(A) and S(B) denote the scores of teams 
A and B, and let X = S(A) — S(B) — f . That is, X is the amount by which team 
A beats the point spread. It has been claimed that the distribution of X is normal 
with mean and standard deviation 14. Use data from randomly chosen football 
games to test this hypothesis. 

17. A medical scientist believes that the average basal temperature of (outwardly) 
healthy individuals has increased over time and is now greater than 98.6 degrees 
Fahrenheit (37 degrees Celsius). To prove this, she has randomly selected 100 
healthy individuals. If their mean temperature is 98.74 with a sample standard 
deviation of 1 . 1 degrees, does this prove her claim at the 5 percent level? What 
about at the 1 percent level? 

18. Use the results of a Sunday's worth of NFL professional football games to test the 
hypothesis that the average number of points scored by winning teams is less than 
or equal to 28. Use the 5 percent level of significance. 

19. Use the results of a Sunday's worth of major league baseball scores to test the 
hypothesis that the average number of runs scored by winning teams is at least 5.6. 
Use the 5 percent level of significance. 

20. A car is advertised as having a gas mileage rating of at least 30 miles/gallon in 
highway driving. If the miles per gallon obtained in 10 independent experiments 
are 26, 24, 20, 25, 27, 25, 28, 30, 26, 33, should you believe the advertisement? 
What assumptions are you making? 

21. A producer specifies that the mean lifetime of a certain type of battery is at least 
240 hours. A sample of 1 8 such batteries yielded the following data. 

237 242 232 

242 248 230 

244 243 254 

262 234 220 

225 236 232 

218 228 240 

Assuming that the life of the batteries is approximately normally distributed, do 
the data indicate that the specifications are not being met? 

22. Use the data of Example 2.3i of Chapter 2 to test the null hypothesis that the 
average noise level directly outside of Grand Central Station is less than or equal 
to 80 decibels. 



338 Chapter 8: Hypothesis Testing 



23. An oil company claims that the sulfur content of its diesel fuel is at most . 1 5 
percent. To check this claim, the sulfur contents of 40 randomly chosen samples 
were determined; the resulting sample mean and sample standard deviation were 
.162 and .040. Using the 5 percent level of significance, can we conclude that the 
company's claims are invalid? 

24. A company supplies plastic sheets for industrial use. A new type of plastic has been 
produced and the company would like to claim that the average stress resistance 
of this new product is at least 30.0, where stress resistance is measured in pounds 
per square inch (psi) necessary to crack the sheet. The following random sample 
was drawn off the production line. Based on this sample, would the claim clearly 
be unjustified? 

30.1 32.7 22.5 27.5 
27.7 29.8 28.9 31.4 

31.2 24.3 26.4 22.8 
29.1 33.4 32.5 21.7 

Assume normality and use the 5 percent level of significance. 

25. It is claimed that a certain type of bipolar transistor has a mean value of current 
gain that is at least 210. A sample of these transistors is tested. If the sample mean 
value of current gain is 200 with a sample standard deviation of 35, would the 
claim be rejected at the 5 percent level of significance if 

(a) the sample size is 25; 

(b) the sample size is 64? 

26. A manufacturer of capacitors claims that the breakdown voltage of these capacitors 
has a mean value of at least 100 V. A test of 12 of these capacitors yielded the 
following breakdown voltages: 

96, 98, 105, 92, 111,1 14, 99, 103, 95, 101, 106, 97 

Do these results prove the manufacturer's claim? Do they disprove them? 

27. A sample of 10 fish were caught at lake A and their PCB concentrations were 
measured using a certain technique. The resulting data in parts per million were 

Lake A: 11.5, 10.8, 11.6,9.4, 12.4, 11.4, 12.2, 11, 10.6, 10.8 

In addition, a sample of 8 fish were caught at lake B and their levels of PCB were 
measured by a different technique than that used at lake A. The resultant data were 

LakeB: 11.8,12.6,12.2, 12.5,11.7,12.1,10.4,12.6 
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If it is known that the measuring technique used at lake A has a variance of .09 
whereas the one used at lake B has a variance of .16, could you reject (at the 
5 percent level of significance) a claim that the two lakes are equally contaminated? 

28. A method for measuring the pH level of a solution yields a measurement value 
that is normally distributed with a mean equal to the actual pH of the solution 
and with a standard deviation equal to .05. An environmental pollution scientist 
claims that two different solutions come from the same source. If this were so, then 
the pH level of the solutions would be equal. To test the plausibility of this claim, 
10 independent measurements were made of the pH level for both solutions, with 
the following data resulting. 



Measurements of 


Measurements of 


Solution A 


Solution B 


6.24 


6.27 


6.31 


6.25 


6.28 


6.33 


6.30 


6.27 


6.25 


6.24 


6.26 


6.31 


6.24 


6.28 


6.29 


6.29 


6.22 


6.34 


6.28 


6.27 



(a) Do the data disprove the scientist's claim? Use the 5 percent level of 
significance. 

(b) What is the /(-value? 

29. The following are the values of independent samples from two different 
populations. 



Sample 1 


122, 114, 130, 165, 144, 133, 139, 142, 150 


Sample 2 


108, 125, 122, 140, 132, 120, 137, 128, 138 



Let fi\ and fij be the respective means of the two populations. Find the/>-value 
of the test of the null hypothesis 

H : Mi < m 2 

versus the alternative 
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when the population standard deviations are a\ 
(a) ct 2 = 5; (b) o 2 = 10; (c) o 2 = 20. 



10 and 



30. The data below give the lifetimes in hundreds of hours of samples of two types of 
electronic tubes. Past lifetime data of such tubes have shown that they can often be 
modeled as arising from a lognormal distribution. That is, the logarithms of the 
data are normally distributed. Assuming that variance of the logarithms is equal 
for the two populations, test, at the 5 percent level of significance, the hypothesis 
that the two population distributions are identical. 

Type 1 32, 84, 37, 42, 78, 62, 59, 7 A 



Type 2 



39,111,55,106,90,87,85 



31. The viscosity of two different brands of car oil is measured and the following data 
resulted: 



Brand 1 



Brand 2 



10.62, 10.58, 10.33, 10.72, 10.44, 10.74 



10.50, 10.52, 10.58, 10.62, 10.55, 10.51, 10.53 



Test the hypothesis that the mean viscosity of the two brands is equal, assuming 
that the populations have normal distributions with equal variances. 

32. It is argued that the resistance of wire A is greater than the resistance of wire B. 
You make tests on each wire with the following results. 



Wire A 


WireB 


.140 ohm 


.135 ohm 


.138 


.140 


.143 


.136 


.142 


.142 


.144 


.138 


.137 


.140 



What conclusion can you draw at the 10 percent significance level? Explain what 
assumptions you are making. 

In Problems 33 through 40, assume that the population distributions are normal 
and have equal variances. 

33. Twenty-five men between the ages of 25 and 30, who were participating in a well- 
known heart study carried out in Framingham, Massachusetts, were randomly 
selected. Of these, 11 were smokers and 14 were not. The following data refer to 
readings of their systolic blood pressure. 
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Smokers 


Nonsmokers 


124 


130 


134 


122 


136 


128 


125 


129 


133 


118 


127 


122 


135 


116 


131 


127 


133 


135 


125 


120 


118 


122 




120 




115 




123 



Use these data to test the hypothesis that the mean blood pressures of smokers and 
nonsmokers are the same. 

34. In a 1943 experiment (Whitlock and Bliss, "A Bioassay Technique for Anti- 
helminthics," Journal of Parasitology, 29, pp. 48-58) 10 albino rats were used 
to study the effectiveness of carbon tetrachloride as a treatment for worms. Each 
rat received an injection of worm larvae. After 8 days, the rats were randomly 
divided into two groups of 5 each; each rat in the first group received a dose of 
.032 cc of carbon tetrachloride, whereas the dosage for each rat in the second group 
was .063 cc. Two days later the rats were killed, and the number of adult worms 
in each rat was determined. The numbers detected in the group receiving the .032 
dosage were 



whereas they were 



421,462,400,378,413 



207,17,412,74,116 



for those receiving the .063 dosage. Do the data prove that the larger dosage is 
more effective than the smaller? 

35. A professor claims that the average starting salary of industrial engineering 
graduating seniors is greater than that of civil engineering graduates. To study 
this claim, samples of 16 industrial engineers and 16 civil engineers, all of whom 
graduated in 1993, were chosen and sample members were queried about their 
starting salaries. If the industrial engineers had a sample mean salary of $47,700 and 
a sample standard deviation of $2,400, and the civil engineers had a sample mean 
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salary of $46,400 and a sample standard deviation of $2,200, has the professor's 
claim been verified? Find the appropriate />-value. 

36. In a certain experimental laboratory, a method A for producing gasoline from 
crude oil is being investigated. Before completing experimentation, a new method 
B is proposed. All other things being equal, it was decided to abandon A in 
favor of B only if the average yield of the latter was clearly greater. The yield of 
both processes is assumed to be normally distributed. However, there has been 
insufficient time to ascertain their true standard deviations, although there appears 
to be no reason why they cannot be assumed equal. Cost considerations impose 
size limits on the size of samples that can be obtained. If a 1 percent significance 
level is all that is allowed, what would be your recommendation based on the 
following random samples? The numbers represent percent yield of crude oil. 

23.2, 26.6, 24.4, 23.5, 22.6, 25.7, 25.5 



25.7, 27.7, 26.2, 27.9, 25.0, 21.4, 26.1 

37. A study was instituted to learn how the diets of women changed during the winter 
and the summer. A random group of 12 women were observed during the month 
of July and the percentage of each woman's calories that came from fat was deter- 
mined. Similar observations were made on a different randomly selected group of 
size 12 during the month of January. The results were as follows: 



July 



January 



32.2, 27.4, 28.6, 32.4, 40.5, 26.2, 29.4, 25.8, 36.6, 30.3, 28.5, 32.0 



30.5, 28.4, 40.2, 37.6, 36.5, 38.8, 34.7, 29.5, 29.7, 37.2, 41.5, 37.0 



Test the hypothesis that the mean fat percentage intake is the same for both months. 
Use the (a) 5 percent level of significance and (b) 1 percent level of significance. 

38. To learn about the feeding habits of bats, 22 bats were tagged and tracked by 
radio. Of these 22 bats, 12 were female and 10 were male. The distances flown 
(in meters) between feedings were noted for each of the 22 bats, and the following 
summary statistics were obtained. 



Female Bats Male Bats 



n = 12 


m = l0 


X= 180 


Y= 136 


S x = 92 


Sy = 86 



Test the hypothesis that the mean distance flown between feedings is the same 
for the populations of both male and of female bats. Use the 5 percent level of 
significance. 
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39. The following data summary was obtained from a comparison of the lead content of 
human hair removed from adult individuals that had died between 1880 and 1920 
with the lead content of present-day adults. The data are in units of micrograms, 
equal to one-millionth of a gram. 



1880-1920 Today 



Sample size: 30 100 

Sample mean: 48.5 26.6 

Sample standard deviation: 14.5 12.3 

(a) Do the above data establish, at the 1 percent level of significance, that the 
mean lead content of human hair is less today than it was in the years between 
1880 and 1920? Clearly state what the null and alternative hypotheses are. 

(b) What is the p-value for the hypothesis test in part (a)? 

40. Sample weights (in pounds) of newborn babies born in two adjacent counties in 
Western Pennsylvania yielded the following data. 

n = 53, m = 44 

X = 6.8, F = 7.2 

S 2 = 5.2, S 2 = 4.9 

Consider a test of the hypothesis that the mean weight of newborns is the same in 
both counties. What is the resulting Rvalue? 

41. To verify the hypothesis that blood lead levels tend to be higher for children whose 
parents work in a factory that uses lead in the manufacturing process, researchers 
examined lead levels in the blood of 33 children whose parents worked in a battery 
manufacturing factory. (Morton, D., Saah, A., Silberg, S., Owens, W., Roberts, 
M., and Saah, M., "Lead Absorption in Children of Employees in a Lead-Related 
Industry," American journal of Epidemiology, 115, 549-555, 1982.) Each of these 
children were then matched by another child who was of similar age, lived in a 
similar neighborhood, had a similar exposure to traffic, but whose parent did not 
work with lead. The blood levels of the 33 cases (sample 1) as well as those of 
the 33 controls (sample 2) were then used to test the hypothesis that the average 
blood levels of these groups are the same. If the resulting sample means and sample 
standard deviations were 

xi = .015, si = .004, x 2 = .006, s 2 = .006 

find the resulting Rvalue. Assume a common variance. 
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42. Ten pregnant women were given an injection of pitocin to induce labor. Their 
systolic blood pressures immediately before and after the injection were: 



Patient 


Before 


After 


Patient 


Before 


After 


1 


134 


140 


6 


140 


138 


2 


122 


130 


7 


118 


124 


3 


132 


135 


8 


127 


126 


4 


130 


126 


9 


125 


132 


5 


128 


134 


10 


142 


144 



Do the data indicate that injection of this drug changes blood pressure? 

43. A question of medical importance is whether jogging leads to a reduction in 
one's pulse rate. To test this hypothesis, 8 nonjogging volunteers agreed to begin 
a 1-month jogging program. After the month their pulse rates were determined 
and compared with their earlier values. If the data are as follows, can we conclude 
that jogging has had an effect on the pulse rates? 



Subject 


1 


2 


3 


4 


5 


6 


7 


8 


Pulse Rate Before 


74 


86 


98 


102 


78 


84 


79 


70 


Pulse Rate After 


70 


85 


90 


110 


71 


80 


69 


74 



44. HX\, . , , ,X n is a sample from a normal population having unknown parameters 
/x and a , devise a significance level a test of 



versus the alternative 



H = a 2 < a 2 



H l =a 2 > a} 



for a given positive value <Jq. 

45. In Problem 44, explain how the test would be modified if the population mean \x 
were known in advance. 

46. A gun-like apparatus has recently been designed to replace needles in administering 
vaccines. The apparatus can be set to inject different amounts of the serum, but 
because of random fluctuations the actual amount injected is normally distributed 
with a mean equal to the setting and with an unknown variance a 2 . It has been 
decided that the apparatus would be too dangerous to use if a exceeds .10. If a 
random sample of 50 injections resulted in a sample standard deviation of .08, 
should use of the new apparatus be discontinued? Suppose the level of significance 
is a = .10. Comment on the appropriate choice of a significance level for this 
problem, as well as the appropriate choice of the null hypothesis. 
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47. A pharmaceutical house produces a certain drug item whose weight has a standard 
deviation of .5 milligrams. The company's research team has proposed a new 
method of producing the drug. However, this entails some costs and will be 
adopted only if there is strong evidence that the standard deviation of the weight 
of the items will drop to below .4 milligrams. If a sample of 10 items is produced 
and has the following weights, should the new method be adopted? 



5.728 


5.731 


5.722 


5.719 


5.727 


5.724 


5.718 


5.726 


5.723 


5.722 



48. The production of large electrical transformers and capacitators requires the use of 
polychlorinated biphenyls (PCBs), which are extremely hazardous when released 
into the environment. Two methods have been suggested to monitor the levels 
of PCB in fish near a large plant. It is believed that each method will result in 
a normal random variable that depends on the method. Test the hypothesis at 
the a = .10 level of significance that both methods have the same variance, if a 
given fish is checked 8 times by each method with the following data (in parts per 
million) recorded. 



Method 1 


6.2,5.8,5.7,6.3,5.9,6.1,6.2,5.7 


Method 2 


6.3,5.7,5.9,6.4,5.8,6.2,6.3,5.5 



49. In Problem 31, test the hypothesis that the populations have the same variances. 

50. If X\, . . . ,X„ is a sample from a normal population with variance o^, and 
Y\, . . . , Y„ is an independent sample from normal population with variance or, 
develop a significance level a test of 

H o : a l < a y versus H \ ■ °l > a y 

51. The amount of surface wax on each side of waxed paper bags is believed to be 
normally distributed. However, there is reason to believe that there is greater 
variation in the amount on the inner side of the paper than on the outside. 
A sample of 75 observations of the amount of wax on each side of these bags 
is obtained and the following data recorded. 



Wax in Pounds per Unit Area of Sample 

Outside Surface Inside Surface 

x=.948 y = .652 

£* 2 = 91 £^ = 82 
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Conduct a test to determine whether or not the variability of the amount of wax on 
the inner surface is greater than the variability of the amount on the outer surface 
(a = .05). 

52. In a famous experiment to determine the efficacy of aspirin in preventing heart 
attacks, 22,000 healthy middle-aged men were randomly divided into two equal 
groups, one of which was given a daily dose of aspirin and the other a placebo that 
looked and tasted identical to the aspirin. The experiment was halted at a time 
when 104 men in the aspirin group and 189 in the control group had had heart 
attacks. Use these data to test the hypothesis that the taking of aspirin does not 
change the probability of having a heart attack. 

53. In the study of Problem 52, it also resulted that 119 from the aspirin group and 
98 from the control group suffered strokes. Are these numbers significant to show 
that taking aspirin changes the probability of having a stroke? 

54. A standard drug is known to be effective in 72 percent of the cases in which it 
is used to treat a certain infection. A new drug has been developed and testing 
has found it to be effective in 42 cases out of 50. Is this strong enough evidence 
to prove that the new drug is more effective than the old one? Find the relevant 
/>-value. 

55. Three independent news services are running a poll to determine if over half the 
population supports an initiative concerning limitations on driving automobiles 
in the downtown area. Each wants to see if the evidence indicates that over half 
the population is in favor. As a result, all three services will be testing 

Hq : p < .5 versus H\ : p > .5 

where/> is the proportion of the population in favor of the initiative. 

(a) Suppose the first news organization samples 100 people, of which 56 are in 
favor of the initiative. Is this strong enough evidence, at the 5 percent level 
of significance, to reject the null hypothesis and so establish that over half the 
population favors the initiative? 

(b) Suppose the second news organization samples 120 people, of which 68 are 
in favor of the initiative. Is this strong enough evidence, at the 5 percent level 
of significance, to reject the null hypothesis? 

(c) Suppose the third news organization samples 110 people, of which 62 are in 
favor of the initiative. Is this strong enough evidence, at the 5 percent level of 
significance, to reject the null hypothesis? 

(d) Suppose the news organizations combine their samples, to come up with 
a sample of 330 people, of which 186 support the initiative. Is this strong 
enough evidence, at the 5 percent level of significance, to reject the null 
hypothesis? 
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56. According to the U.S. Bureau of the Census, 25.5 percent of the population of 
those age 18 or over smoked in 1990. A scientist has recently claimed that this 
percentage has since increased, and to prove her claim she randomly sampled 500 
individuals from this population. If 138 of them were smokers, is her claim proved? 
Use the 5 percent level of significance. 

57. An ambulance service claims that at least 45 percent of its calls involve life- 
threatening emergencies. To check this claim, a random sample of 200 calls 
was selected from the service's files. If 70 of these calls involved life-threatening 
emergencies, is the service's claim believable at the 

(a) 5 percent level of significance; 

(b) 1 percent level of significance? 

58. A standard drug is known to be effective in 75 percent of the cases in which it 
is used to treat a certain infection. A new drug has been developed and has been 
found to be effective in 42 cases out of 50. Based on this, would you accept, at 
the 5 percent level of significance, the hypothesis that the two drugs are of equal 
effectiveness? What is the />-value? 

59. Do Problem 58 by using a test based on the normal approximation to the 
binomial. 

60. In a recently conducted poll, 54 out of 200 people surveyed claimed to have a 
firearm in their homes. In a similar survey done earlier, 30 out of 150 people 
made that claim. Is it possible that the proportion of the population having 
firearms has not changed and the foregoing is due to the inherent randomness in 
sampling? 

61. Let X\ denote a binomial random variable with parameters in\,p\) and X 2 an 
independent binomial random variable with parameters (n 2 ,p 2 ). Develop a test, 
using the same approach as in the Fisher-Irwin test, of 

Ho : pi < pi 
versus the alternative 

H\ :p\ > p 2 

62. Verify that Equation 8.6.5 follows from Equation 8.6.4. 

63. Let X\ and X 2 be binomial random variables with respective parameters n\,p\ 
and n 2 ,p 2 . Show that when n\ and n% are large, an approximate level a test of 
Hq : p\ = p2 versus H\ : p\ ^ p 2 is as follows: 

„ . r \X x lr n -X 2 ln 2 \ 

reject H if > z a / 2 

\X l +X 2 ( l Xi+X 2 \(\ , 1 



n\ + «2 \ n\ + n 2 J \n\ n 2 
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HinP. (a) Argue first that when n\ and ni are large 
X\ Xi 



(pi -pi) 
n\ w 2 

>i ( 1 -pi) p2J\ -pi) 
n\ ni 



'#(0,1) 



where ~ means "approximately has the distribution." 

(b) Now argue that when Hq is true and so/>i = pi, their common value 
can be best estimated by (X\ + Xi)l(n\ + n-i). 

64. Use the approximate test given in Problem 63 on the data of Problem 60. 

65. Patients suffering from cancer must often decide whether to have their tumors 
treated with surgery or with radiation. A factor in their decision is the 5-year 
survival rates for these treatments. Surprisingly, it has been found that patient's 
decisions often seem to be affected by whether they are told the 5-year survival 
rates or the 5-year death rates (even though the information content is identical). 
For instance, in an experiment a group of 200 male prostate cancer patients were 
randomly divided into two groups of size 100 each. Each member of the first group 
was told that the 5-year survival rate for those electing surgery was 77 percent, 
whereas each member of the second group was told that the 5-year death rate for 
those electing surgery was 23 percent. Both groups were given the same information 
about radiation therapy. If it resulted that 24 members of the first group and 12 
of the second group elected to have surgery, what conclusions would you draw? 

66. The following data refer to Larry Bird's results when shooting a pair of free throws in 
basketball. During two consecutive seasons in the National Basketball Association, 
Bird shot a pair of free throws on 338 occasions. On 251 occasions he made both 
shots; on 34 occasions he made the first shot but missed the second one; on 48 
occasions he missed the first shot but made the second one; on 5 occasions he 
missed both shots. 

(a) Use these data to test the hypothesis that Bird's probability of making the first 
shot is equal to his probability of making the second shot. 

(b) Use these data to test the hypothesis that Bird's probability of making the 
second shot is the same regardless of whether he made or missed the first one. 

67. In the nineteen seventies, the U.S. Veterans Administration (Murphy, 1977) con- 
ducted an experiment comparing coronary artery bypass surgery with medical 
drug therapy as treatments for coronary artery disease. The experiment involved 
596 patients, of whom 286 were randomly assigned to receive surgery, with the 
remaining 310 assigned to drug therapy. A total of 252 of those receiving surgery, 
and a total of 270 of those receiving drug therapy were still alive three years after 
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treatment. Use these data to test the hypothesis that the survival probabilities are 
equal. 

68. Test the hypothesis, at the .05 level of significance, that the yearly number of 
earthquakes felt on a certain island has mean 52 if the readings for the last 8 years 
are 46, 62, 60, 58, 47, 50, 59, 49. Assume an underlying Poisson distribution and 
give an explanation to justify this assumption. 

69. The following table gives the number of fatal accidents of U.S. commercial airline 
carriers in the 16 years from 1980 to 1995. Do these data disprove, at the 5 percent 
level of significance, the hypothesis that the mean number of accidents in a year is 
greater than or equal to 4.5? What is the/>-value? {Hint: First formulate a model 
for the number of accidents.) 

U.S. Airline Safety, Scheduled Commercial Carriers, 1980—1995 











Fatal 


















Accidents 












Depar- 


Fatal 




per 




Depar- 


Fatal 






tures 


Acci- 


Fatal- 


100,000 




tures 


Acci- 


Fatal- 




(millions) 


dents 


ities 


Departures 




(millions) 


dents 


ities 


1980.. 


5.4 








.000 


1988... 


6.7 


3 


285 


1981.. 


5.2 


4 


4 


.077 


1989... 


6.6 


11 


278 


1982.. 


5.0 


4 


233 


.060 


1990. . . 


6.9 


6 


39 


1983.. 


5.0 


4 


15 


.079 


1991... 


6.8 


4 


62 


1984.. 


5.4 


1 


4 


.018 


1992... 


7.1 


4 


33 


1985.. 


5.8 


4 


197 


.069 


1993... 


7.2 


1 


1 


1986.. 


6.4 


2 


5 


.016 


1994... 


7.5 


4 


239 


1987.. 


6.6 


4 


231 


.046 


1995... 


8.1 


2 


166 



Source: National Transportation Safety Board 

70. For the following data, sample 1 is from a Poisson distribution with mean X\ and 
sample 2 is from a Poisson distribution with mean Xz- Test the hypothesis that 

Ai = X%. 



Sample 1 



24,32,29,33,40,28,34,36 



Sample 2 



42,36,41 



71. A scientist looking into the effect of smoking on heart disease has chosen a large 
random sample of smokers and of nonsmokers. She plans to study these two 
groups for 5 years to see if the number of heart attacks among the members of the 
smokers' group is significantly greater than the number among the nonsmokers. 
Such a result, the scientist feels, should be strong evidence of an association between 
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smoking and heart attacks. Given that 

1. Older people are at greater risk of heart disease than are younger people; and 

2. As a group, smokers tend to be somewhat older than nonsmokers; 

would the scientist be justified in her conclusion? Explain how the experi- 
mental design can be improved so that meaningful conclusions can be drawn. 

72. A researcher wants to analyze the average yearly increase in a stock over a 20 year 
period. To do so, she plans to randomly choose 100 stocks from the listing of 
current stocks, discarding any that were not in existence 20 years ago. She will then 
compare the current price of each stock with its price 20 years ago to determine 
its percentage increase. Do you think this is a valid method to study the average 
increase in the price of a stock? 



Chapter 9 

REGRESSION 



9.1 INTRODUCTION 

Many engineering and scientific problems are concerned with determining a relationship 
between a set of variables. For instance, in a chemical process, we might be interested in 
the relationship between the output of the process, the temperature at which it occurs, 
and the amount of catalyst employed. Knowledge of such a relationship would enable us 
to predict the output for various values of temperature and amount of catalyst. 

In many situations, there is a single response variable Y , also called the dependent vari- 
able, which depends on the value of a set of input, also called independent, variables 
x\, . . . , x r . The simplest type of relationship between the dependent variable Y and the 
input variables x\ , . . . , x r is a linear relationship. That is, for some constants /So, fix , ■ ■ ■ , fir 
the equation 

Y = p + p lXl + --- + l3 r x r (9.1.1) 

would hold. If this was the relationship between Tand the xy, i = 1, . . . , r, then it would 
be possible (once the /?/ were learned) to exactly predict the response for any set of input 
values. However, in practice, such precision is almost never attainable, and the most that 
one can expect is that Equation 9.1.1 would be valid subject to random error. By this we 
mean that the explicit relationship is 

Y = Po + Pixi + --- + p r x r + e (9.1.2) 

where e, representing the random error, is assumed to be a random variable having mean 
0. Indeed, another way of expressing Equation 9.1.2 is as follows: 

E[Y\x] = ft, + |8i*i + • • • + Pr*r 

where x = (x\, . . . ,x r ) is the set of independent variables, and i?[T|x] is the expected 
response given the inputs x. 

Equation 9. 1 .2 is called a linear regression equation. We say that it describes the regression 
of Y on the set of independent variables x\,...,x r . The quantities fio, fi\,...,fi r are called 
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the regression coefficients, and must usually be estimated from a set of data. A regression 
equation containing a single independent variable — that is, one in which r — 1 — is 
called a simple regression equation, whereas one containing many independent variables is 
called a multiple regression equation. 

Thus, a simple linear regression model supposes a linear relationship between the mean 
response and the value of a single independent variable. It can be expressed as 

Y = a + fix + e 

where x is the value of the independent variable, also called the input level, Y is the 
response, and e, representing the random error, is a random variable having mean 0. 

EXAMPLE 9.1a Consider the following 10 data pairs (xi,y;), i = 1, . . . , 10, relating y, the 
percent yield of a laboratory experiment, to x, the temperature at which the experiment 
was run. 



/ 


Xi 


yt 


i 


Xi 


yt 


1 


100 


45 


6 


150 


68 


2 


110 


52 


7 


160 


75 


3 


120 


54 


8 


170 


76 


4 


130 


63 


9 


180 


92 


5 


140 


62 


10 


190 


88 



A plot of yi versus x% — called a scatter diagram — is given in Figure 9.1. As this scatter 
diagram appears to reflect, subject to random error, a linear relation between y and x, it 
seems that a simple linear regression model would be appropriate. ■ 
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FIGURE 9. 1 Scatter plot. 
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9.2 LEAST SQUARES ESTIMATORS OF THE 
REGRESSION PARAMETERS 

Suppose that the responses Yi corresponding to the input values xi, i — 1, . . . , n are to be 
observed and used to estimate a and /3 in a simple linear regression model. To determine 
estimators of a and /? we reason as follows: If A is the estimator of a and B of /3, then the 
estimator of the response corresponding to the input variable xi would be A + Bxi. Since 
the actual response is F,, the squared difference is (1/ — A — Bxj) 2 , and so if A and B are 
the estimators of a and B, then the sum of the squared differences between the estimated 
responses and the actual response values — call it SS — is given by 



SS = Yl(Yi-A-Bxi, 



The method of least squares chooses as estimators of a and B the values of A and B that 
minimize SS. To determine these estimators, we differentiate SS first with respect to^4 and 
then to B as follows: 

dSS A 

dA ^ 

i=\ 

dSS A , 

= -2 > xiiXi -A- Bxi) 

dB ^ 

i=\ 

Setting these partial derivatives equal to zero yields the following equations for the 
minimizing values A and B: 



Y J Y l = nA + BY J X; (9.2.1) 

n n n 



x 

i=\ i= 1 /= 1 



The Equations 9.2.1 are known as the normal equations. If we let 

Y = y^ Yg/n, x = 2 g Xjln 

i i 

then we can write the first normal equation as 

A=Y-Bx (9.2.2) 
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Substituting this value of A into the second normal equation yields 
^xiYi = (Y - Bx)rix + £^~\ 



*? 



or 



or 



bI J^xf-nxA^ J2 

2 xi Yi — rixY 



Yi - rixY 



B = 



E 



2 —2 

X; — nx 



Hence, using Equation 9.2.2 and the fact that nY = Ez"=i ^»'j we nave proven the 
following proposition. 

PROPOSITION 9.2.1 The least squares estimators of B and a corresponding to the data set 
Xi, Yi, i = !,...,« are, respectively, 



5 = 



i=i /=i 

n 

E ^ — «^ 2 



z=l 

^ = Y-Sx 

The straight line ^4 + Bx is called the estimated regression line. 

Program 9.2 computes the least squares estimators A and B. It also gives the user the 
option of computing some other statistics whose values will be needed in the following 
sections. 

EXAMPLE 9.2a The raw material used in the production of a certain synthetic fiber is stored 
in a location without a humidity control. Measurements of the relative humidity in the 
storage location and the moisture content of a sample of the raw material were taken over 
1 5 days with the following data (in percentages) resulting. 



Relative 
































humidity 


46 


53 


29 


61 


36 


39 


47 


49 


52 


38 


55 


32 


57 


54 


44 


Moisture 
































content 


12 


15 


7 


17 


10 


11 


11 


12 


14 


9 


16 


8 


18 


14 


12 



9.3 Distribution of the Estimators 



355 



Estimated 

regression 

line 




40 45 50 

Relative humidity 



65 



FIGURE 9.2 Example 9.2a. 



These data are plotted in Figure 9.2. To compute the least squares estimator and the 
estimated regression line, we run Program 9.2; results are shown in Figure 9.3. I 



9.3 DISTRIBUTION OF THE ESTIMATORS 

To specify the distribution of the estimators A and B, it is necessary to make additional 
assumptions about the random errors aside from just assuming that their mean is 0. The 
usual approach is to assume that the random errors are independent normal random 
variables having mean and variance a 1 . That is, we suppose that if Y{ is the response 
corresponding to the input value Xj, then Yi, . . . , Y„ are independent and 



Note that the foregoing supposes that the variance of the random error does not depend 
on the input value but rather is a constant. This value a is not assumed to be known but 
rather must be estimated from the data. 

Since the least squares estimator B of B can be expressed as 



B = 



Y^(xi - x) Yi 

i 

y^,xf — fix 



(9.3.1) 
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Simple Linear Regression 



Sample size = 15 



FIGURE 9.3 
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The least squares estimators are as follows: 

a= -2.51 Average x value = 46.13 



b= 0.32 



Sum of squares of the x values = 3321 2.0 



The estimated regression line is Y = -2.51 + 0.32x 
100 1 




S(x, Y) = 416.2 
S(x, x) = 1287.73 
S(Y, Y) = 147.6 



SSr 



13.08 



300 



we see that it is a linear combination of the independent normal random variables Yi, 
i = 1, . . . , n and so is itself normally distributed. Using Equation 9.3.1, the mean and 
variance of B are computed as follows: 



E[B] 



J2( Xl - x)E[Yi\ 

i 

^ xf — fix 2 

i 

XX*< - x){a + Bxi) 



Z*i 



2 —2 

- - nx 
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' J2( x i - x ) + P J2 x i( x i - x ) 



2 —2 

- - nx L 



since / (xj — x) = 



£* 2 

i 

[Y. x f - x H x i\ 

a ' ' 

= P ^2 =2~~ 

2_^ x ; — «x 

i 



Thus i?[fi] = B and so S is an unbiased estimator of /3. We will now compute the variance 
of B. 



Var(S) = 



VarfEO*-*)^) 

( £ * 2 - «* 2 ) 



£(*,--*) 2 Var(^-) 



= i 



I E x ? — wc ' 2 I 

CT 2 f> 



2 



by independence 



x) 2 



i-\ 



I £ * 2 — wx2 I 



2 



a 2 



E x 1 — nx 2 
i=\ 



(9.3.2) 



where the final equality results from the use of the identity 

n n 

^{xi-x) 1 = ^x 2 



2 -2 

nx 



358 



Chapter 9: Regression 



Using Equation 9.3.1 along with the relationship 



^ = E 



Yt 



Bx 



;'=1 



shows that A can also be expressed as a linear combination of the independent normal 
random variables Yj, i= \,...,n and is thus also normally distributed. Its mean is 
obtained from 

n F\Y-] 
E[A\ = V 1 -^ - xE[B] 
'-^ n 

i=\ 

E(a + Bxi) _ 
xB 
n 

i=\ 

= a + Bx — xB 
= a 

Thus A is also an unbiased estimator. The variance of A is computed by first expressing 
Aa.sa. linear combination of the Yi. The result (whose details are left as an exercise) is that 



Var(y4) 



a 2 E *f 

i=\ 



n\ E x i~ nx ^ 



(9.3.3) 



The quantities Yi — A — Bxi, i = 1, . . . , n, which represent the differences between the 
actual responses (that is, the Yi) and their least squares estimators (that is, A + Bxi) are 
called the residuals. The sum of squares of the residuals 

B 

SS R = J2(Yi-A-B Xi ) 2 

i-\ 

can be utilized to estimate the unknown error variance a . Indeed, it can be shown that 

SSr ^ 2 

i A-n—2 



That is, SSrIo has a chi-square distribution with n — 2 degrees of freedom, which implies 
that 



'SSr 



n — 2 
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SS R 



o 2 



Thus SSrI{h — 2) is an unbiased estimator of er . In addition, it can be shown that SSr 
is independent of the pair A and B. 

REMARKS 

A plausibility argument as to why SSrIo might have a chi-square distribution with n — 2 
degrees of freedom and be independent of A and B runs as follows. Because the Y, are 
independent normal random variables, it follows that (Y; — E[Y;])/*jVa.r(Y;), i = l,...,n 
are independent standard normals and so 

A (Yj - E[Y t ]) 2 = " (Yi-a- fix,) 2 2 
f? Var(^) f-« CT 2 ^ X " 

Now if we substitute the estimators A and B for a and B, then 2 degrees of freedom 
are lost, and so it is not an altogether surprising result that SSrIo has a chi-square 
distribution with n — 2 degrees of freedom. 

The fact that SSr is independent of A and B is quite similar to the fundamental result 
that in normal sampling X and S are independent. Indeed this latter result states that 
if Y\, . . . , Y n is a normal sample with population mean /x and variance 07, then if in the 
sum of squares ~Y^" = \{Yi — fi) /a , which has a chi-square distribution with n degrees 
of freedom, one substitutes the estimator Y for fi to obtain the new sum of squares 
^2i(Yj — Y) 2 la 2 , then this quantity [equal to (n — \)S 2 la 2 ] will be independent of 
Y and will have a chi-square distribution with n — \ degrees of freedom. Since SSrIo 
is obtained by substituting the estimators A and B for a and B in the sum of squares 
~YTi = \{Yi — a — Bxi, ) lo ' , it is not unreasonable to expect that this quantity might be 
independent of A and B. 

When the Y{ are normal random variables, the least square estimators are also the 
maximum likelihood estimators. To verify this remark, note that the joint density of 
Y\,...,Y n is given by 



fY u ...,Y„(yi,-- ■ ,y n ) = WfrAyt) 



i=\ 



n 1 

1 1 Htt 



1 .-y:y-,(y,-«-fr¥) 2 /2g 2 



(27t)" /2 cr 
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Consequently, the maximum likelihood estimators of a and /3 are precisely the values of a 
and B that minimize X^iLiCv* — a ~ P x i) ■ That is, they are the least squares estimators. 



If we let 



Notation 

n n 

S x y = 2_^fc ~~ x)(Yi — Y) = 2_ / x i^ ~ n *Y 

i=\ i=\ 

n n 

Sxx = 2^fc - x) 2 = 2J xf - rix 2 
S YY = Y j (J l -Y? = Y j Y 2 -nY 1 



i=l 



then the least squares estimators can be expressed as 

SxY 

B = 



^xx 

A=Y -Bx 

The following computational identity for SSr, the sum of squares of the residuals, can 
be established. 



Computational Identity for S5/j 



SS R = 



Sxx^yy — S xY 



(93A 



The following proposition sums up the results of this section. 

PROPOSITION 9.3.1 Suppose that the responses Y{, i = 1, . . . ,n are independent normal 
random variables with means a + Rxi and common variance a 2 . The least squares 
estimators of /3 and a 

SxY 



B 



are distributed as follows: 



A~N 



A=Y-Bx 



( - 2 I>A 



a, 



V 



nS x: 



B ~ M(B, a 2 ISx 
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In addition, if we let 

SS R = Y J (Yi-A-Bx l ) 2 

i 

denote the sum of squares of the residuals, then 

SSr ^ 2 



and SSr is independent of the least squares estimators A and B. Also, SSr can be computed 
from 

„„ S xx Syy — (S x y) 2 
SSr = 

Program 9.2 will compute the least squares estimators A and B as well as x, ^- xf, 
S xx , S x y, Syy, and SSr. 

EXAMPLE 9.3a The following data relate x, the moisture of a wet mix of a certain product, 
to Y, the density of the finished product. 



XI 


yi 


5 


7 A 


6 


9.3 


7 


10.6 


10 


15.4 


XI 


yi 


12 


18.1 


15 


22.2 


18 


24.1 


20 


24.8 



Fit a linear curve to these data. Also determine SSr. 

SOLUTION A plot of the data and the estimated regression line is shown in Figure 9.4. 
To solve the foregoing, run Program 9.2; results are shown in Figure 9.5. M 



9.4 STATISTICAL INFERENCES ABOUT THE 
REGRESSION PARAMETERS 

Using Proposition 9.3.1, it is a simple matter to devise hypothesis tests and confidence 
intervals for the regression parameters. 
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FIGURE 9.4 Example 9.3a. 



9.4. 1 Inferences Concerning fi 

An important hypothesis to consider regarding the simple linear regression model 

Y = a + fix + e 

is the hypothesis that /? = 0. Its importance derives from the fact that it is equivalent to 
stating that the mean response does not depend on the input, or, equivalently, that there 
is no regression on the input variable. To test 

H Q : B = versus H\ : 8 ^ 

note that, from Proposition 9.3.1, 



^W x: 



AA(0, 1) 



(9.4.1) 



and is independent of 



SSr 2 

a 2 ~ Xn-2 



Hence, from the definition of a ^-random variable it follows that 



fS7 x {B-P)lo Un-2)S X 



SSr 



a 2 (n-2) 



SSr 



(B -0)~ t n - 2 



(9.4.2) 
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Sample size = 8 



Simple Linear Regression 



20 



Data Points 



24.8 



Add This Point To List 



Remove Selected Point From List 



5,7.4 


* 


6,9.3 




7, 10.6 




10, 15.4 




12, 18.1 




15,22.2 




18,24.1 




RiWSEl. 



Clear List 



Start 



Quit 



The least squares estimators are as follows: 

a= 2.46 Average x value = 11.63 

b = 1 .21 Sum of squares of the x values = 1303.0 

The estimated regression line is Y = 2.46 + 1 .21 x 




200 



S(x, Y) = 267.66 
S(x, x) = 221.88 
S(Y, Y) = 332.37 



SS t 



9.47 



FIGURE 9.5 



That is, V(« — 2)S xx /SSr(B — ji) has a ^-distribution with n — 2 degrees of freedom. 
Therefore, if Hq is true (and so f5 = 0), then 



(n - 2)S X 



SS R 



B ~ t„ 



which gives rise to the following test of Hq. 
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Hypothesis Test of W : ji = 

A significance level y test of Hq is to 



reject H if / — — — \B\ > t y / 2 ,n-2 

V MR 

accept Hq otherwise 

This test can be performed by first computing the value of the test statistic 
-JJji — 2)S xx /SSr\B\ — call its value v — and then rejecting Hq if the desired significance 
level is at least as large as 

/>-value = P{\T n - 2 \ > v\ 

= 2P{T„- 2 > v) 

where T n -2 is a ^-random variable with n — 2 degrees of freedom. This latter probability 
can be obtained by using Program 5.8.2a. 

EXAMPLE 9.4a An individual claims that the fuel consumption of his automobile does 
not depend on how fast the car is driven. To test the plausibility of this hypothesis, the 
car was tested at various speeds between 45 and 70 miles per hour. The miles per gallon 
attained at each of these speeds was determined, with the following data resulting: 



Speed 


Miles per Gallon 


45 


24.2 


50 


25.0 


55 


23.3 


60 


22.0 


65 


21.5 


70 


20.6 


75 


19.8 



Do these data refute the claim that the mileage per gallon of gas is unaffected by the speed 
at which the car is being driven? 

SOLUTION Suppose that a simple linear regression model 

Y = a + fix + e 

relates Y, the miles per gallon of the car, to x, the speed at which it is being driven. Now, 
the claim being made is that the regression coefficient /? is equal to 0. To see if the data 
are strong enough to refute this claim, we need to see if it leads to a rejection of the null 
hypothesis when testing 

H : /? = versus H x : y8 ^ 
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To compute the value of the test statistic, we first compute the values o£S xx , Syy, and S x y. 
A hand calculation yields that 

S XX = 7QQ, Syy = 21.757, S xY = -119 
Using Equation 9.3.4 gives 

ssr = [s xx Syy - s xY ys xx 

= [700(21.757) - (119) 2 ]/700 = 1.527 

Because 

B = S xY /S xx = -119/700 = -.17 
the value of the test statistic is 

TS = V5(700)/1.527(.17) = 8.139 

Since, from Table A2 of the Appendix, £.005,5 = 4.032, it follows that the hypothesis 
/6 = is rejected at the 1 percent level of significance. Thus, the claim that the mileage 
does not depend on the speed at which the car is driven is rejected; there is strong evidence 
that increased speeds lead to decreased mileages. ■ 

A confidence interval estimator for ft is easily obtained from Equation 9.4.2. Indeed, 
it follows from Equation 9.4.2 that for any a, < a < 1, 



P \ 



or, equivalently, 



/ (» - 2)S XX 
-t a /2,n-2 < J ~ (-8 ~ P) < t a /2,n-2 



1 — a 



\ SS R ss r 

P \ B - \ 1 =T7-fc«.»-2 < P <B+ ^—t a /2,n-2 \ = l~a 

y [n - 2)S XX y [n - 2)S XX 

which yields the following. 



Confidence Interval for (i 

A 100(1 — a) percent confidence interval estimator of ft is 



' SS R / SS R , 

B — / t a /2 n— 2) B -\- I t a l2 n—2 

, V {n - 2)S XX ' V(n-2)S XX , 



366 Chapter 9: Regression 

REMARK 

The result that 



^is, 



M(o, l) 



cannot be immediately applied to make inferences about /3 since it involves the unknown 
parameter a . Instead, what we do is use the preceding statistic with a replaced by its 
estimator SSrI(h — 2), which has the effect of changing the distribution of the statistic 
from the standard normal to the ^-distribution with n — 2 degrees of freedom. 

EXAMPLE 9.4b Derive a 95 percent confidence interval estimate of /J in Example 9.4a. 

SOLUTION Since £.025,5 = 2.571, it follows from the computations of this example that 
the 95 percent confidence interval is 



,1.527 

-.170 ±2.571 J = -.170 ±.054 

3500 

That is, we can be 95 percent confident that /3 lies between —.224 and —.116. ■ 

9.4. 1 . 1 REGRESSION TO THE MEAN 

The term regression was originally employed by Francis Galton while describing the laws 
of inheritance. Galton believed that these laws caused population extremes to "regress 
toward the mean." By this he meant that children of individuals having extreme values 
of a certain characteristic would tend to have less extreme values of this characteristic 
than their parent. 

If we assume a linear regression relationship between the characteristic of the off- 
spring (Y), and that of the parent (x), then a regression to the mean will occur when 
the regression parameter /3 is between and 1 . That is, if 

E[Y] = a + fix 

and < /? < 1, then E[Y] will be smaller than x when x is large and greater than x 
when x is small. That this statement is true can be easily checked either algebraically or 
by plotting the two straight lines 

y = a + fix 

and 

y = x 

A plot indicates that, when < /3 < 1, the line y = a + fix is above the line y = x for 
small values of x and is below it for large values of x. 
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FIGURE 9.6 Scatter diagram of son i height versus father's height. 
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75 



EXAMPLE 9.4c To illustrate Galton's thesis of regression to the mean, the British statistician 
Karl Pearson plotted the heights of 10 randomly chosen sons versus that of their fathers. 
The resulting data (in inches) were as follows. 



Fathers' height 


60 


62 


64 


65 


66 


61 


68 


70 


72 


7A 


Sons' height 


63.6 


65.2 


66 


65.5 


66.9 


67 A 


67.4 


68.3 


70.1 


70 



A scatter diagram representing these data is presented in Figure 9.6. 

Note that whereas the data appear to indicate that taller fathers tend to have taller 
sons, it also appears to indicate that the sons of fathers that are either extremely short or 
extremely tall tend to be more "average" than their fathers — that is, there is a "regression 
toward the mean." 

We will determine whether the preceding data are strong enough to prove that there is 
a regression toward the mean by taking this statement as the alternative hypothesis. That 
is, we will use the above data to test 

H : P > 1 versus H x : f> < 1 

which is equivalent to a test of 

H : f, = 1 versus H x : p < 1 
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It now follows from Equation 9.4.2 that when B = 1, the test statistic 



TS = J8S XX /SS R (B - 1) 

has a ^-distribution with 8 degrees of freedom. The significance level a test will reject Hq 
when the value of TS is sufficiently small (since this will occur when B, the estimator of 
B, is sufficiently smaller than 1). Specifically, the test is to 

reject H if ^8S xx /SSr(B - 1) < -t a ,8 
Program 9.2 gives that 

^8S XX /SS R (B - 1) = 30.2794(.4646 - 1) = -16.21 
Since ?.oi,8 = 2.896, we see that 

TS < -£.oi,8 

and so the null hypothesis that B > 1 is rejected at the 1 percent level of significance. In 
fact, the ^-value is 

Rvalue = P{T 8 < -16.213} %0 

and so the null hypothesis that B > 1 is rejected at almost any significance level, thus 
establishing a regression toward the mean (see Figure 9.7). 

A modern biological explanation for the regression to the mean phenomenon would 
roughly go along the lines of noting that as an offspring obtains a random selection of 




Father's Height 



FIGURE 9.7 Example 9.4c for x small, y > x. For x large, y < x. 
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one-half of its parents' genes, it follows that the offspring of, say, a very tall parent would, 
by chance, tend to have fewer "tall" genes than its parent. 

While the most important applications of the regression to the mean phenomenon 
concern the relationship between the biological characteristics of an offspring and that 
of its parents, this phenomenon also arises in situations where we have two sets of data 
referring to the same variables. ■ 

EXAMPLE 9.4d The data of Table 9.1 relate the number of motor vehicle deaths occurring 
in 12 counties in the northwestern United States in the years 1988 and 1989. 

A glance at Figure 9.8 indicates that in 1989 there was, for the most part, a reduction in 
the number of deaths in those counties that had a large number of motor deaths in 1988. 
Similarly, there appears to have been an increase in those counties that had a low value in 
1988. Thus, we would expect that a regression to the mean is in effect. In fact, running 
Program 9.2 yields that the estimated regression equation is 

y = 74.589 + .276* 

showing that the estimated value of /3 indeed appears to be less than 1. 

One must be careful when considering the reason behind the regression to the mean 
phenomenon in the preceding data. For instance, it might be natural to suppose that 
those counties that had a large number of deaths caused by motor vehicles in 1988 would 
have made a large effort — perhaps by improving the safety of their roads or by making 
people more aware of the potential dangers of unsafe driving — to reduce this number. 
In addition, we might suppose that those counties that had the fewest number of deaths 
in 1988 might have "rested on their laurels" and not made much of an effort to further 
improve their numbers — and as a result had an increase in the number of casualties the 
following year. 

TABLE 9.1 Motor Vehicle Deaths, Northwestern United States, 1988 and 1989 



County 


Deaths in 1988 


Deaths in 1989 


1 


121 


104 


2 


96 


91 


3 


85 


101 


4 


113 


110 


5 


102 


117 


6 


118 


108 


7 


90 


96 


8 


84 


102 


9 


107 


114 


10 


112 


96 


11 


95 


88 


12 


101 


106 
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FIGURE 9.8 Scatter diagram of 1989 deaths versus 1988 deaths. 

While the above supposition might be correct, it is important to realize that a regression 
to the mean would probably have occurred even if none of the counties had done anything 
out of the ordinary. Indeed, it could very well be the case that those counties having large 
numbers of casualties in 1988 were just very unlucky in that year and thus a decrease 
in the next year was just a return to a more normal result for them. (For an analogy, if 

9 heads results when 10 fair coins are flipped then it is quite likely that another flip of these 

10 coins will result in fewer than 9 heads.) Similarly, those counties having few deaths in 
1988 might have been "lucky" that year and a more normal result in 1989 would thus lead 
to an increase. 

The mistaken belief that regression to the mean is due to some outside influence when 
it is in reality just due to "chance" occurs frequently enough that it is often referred to as 
the regression fallacy. ■ 



9.4.2 Inferences Concerning a 

The determination of confidence intervals and hypothesis tests for a is accomplished in 
exactly the same manner as was done for /?. Specifically, Proposition 9.3.1 can be used to 
show that 



N 



n{n — 2)S X: 

J2xfss R 



(A -a) ~tn-2 



(9.4.3) 



which leads to the following confidence interval estimator of a. 



9.4 Statistical Inferences About the Regression Parameters 



371 



Confidence Interval Estimator of a 

The 100(1 — a) percent confidence interval for a is the interval 



A± 



T,xfSS R 

i 



tal2,n—2 



Hypothesis tests concerning a are easily obtained from Equation 9.4.3, and their 
development is left as an exercise. 

9.4.3 Inferences Concerning the Mean Response a + fixo 

It is often of interest to use the data pairs (xi, Y{),i — 1, . . . ,n, to estimate a + /3xo, the 
mean response for a given input level xq. If it is a point estimator that is desired, then the 
natural estimator is A + Bxq, which is an unbiased estimator since 

E[A + Bx ] = E[A] + x E[B] = a + p Xo 

However, if we desire a confidence interval, or are interested in testing some hypothesis 
about this mean response, then it is necessary to first determine the probability distribution 
of the estimator A + B xa . We now do so. 

Using the expression for B given by Equation 9.3.1 yields that 



B = cY^(xi-x)Yi 



/■=! 



vher 



c = 



HA 
i=\ 



nx" 



Sir 



we see that 



A= Y -B: 



A + Bx 



EYi 






i=\ 


- B(x - x ) 




n 




n 


c{xj — x)(x - 

n 


-xq) 



Since the Y{ are independent normal random variables, the foregoing equation shows 
that A + Bxq can be expressed as a linear combination of independent normal random 
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variables and is thus itself normally distributed. Because we already know its mean, we 
need only compute its variance, which is accomplished as follows: 



VarG4 + Bx ) = J] 



c(xi - x)(x - Xq) 



Var(lfl 



- 2 E 



i 



n- 



c (x — xq) (xi — x) — 2c(xi — x) 



{x - X ) 



= a 



+ c (x — xq) y^(xi — x) — 2c(x — xq) y. 



1 

n 

1 (x - x ) 2 

M ^XX 



(xi - x) 



where the last equality followed from 

n n n 

/,(xj - x) 2 = 22 x f - n * 2 — ^i c — Sxx> z2^ Xi ~ *) = ° 



Hence, we have shown that 



A + Bx ~ M I a + Bxq, o 



1 (x - x) 1 



n S x . 



(9.4.4) 



In addition, because A + Sxq is independent of 



it follows that 



SSrIg 2 ~ Xh 2 -2 



^ + Sxo - (a + Bxq) 



1 (x - x) 2 / 55ft 



*«-2 



(9.4.5) 



« 5vv V « - 2 



Equation 9.4.5 can now be used to obtain the following confidence interval estimator of 
a + Bxq. 



Confidence Interval Estimator of a + /Ixq 

With 100(1 — a) percent confidence, a + Bxq will lie within 



/ 1 (xq - x) 2 I SS R 

A + Bxq ± - H ,/ -ta!2,n-2 

V n S xx V « - 2 
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EXAMPLE 9.4e Using the data of Example 9.4c, determine a 95 percent confidence interval 
for the average height of all males whose fathers are 68 inches tall. 

SOLUTION Since the observed values are 

„ = 10, xo = 68, x = 66.8, S xx = 171.6, SSr = 1.49721 

we see that 



/ 1 (*o - x) 2 I SSr 
y « S xx V n — 2 

Also, because 

'.025,8 = 2.306, A + Bx = 67.5675 1 
we obtain the following 95 percent confidence interval, 

a + p X0 e (67.239, 67.896) ■ 

9.4.4 Prediction Interval of a Future Response 

It is often the case that it is more important to estimate the actual value of a future response 
rather than its mean value. For instance, if an experiment is to be performed at temperature 
level xrjj then we would probably be more interested in predicting Y(xo), the yield from this 
experiment, than we would be in estimating the expected yield — i?[y(xo)] = a + fix®. 
(On the other hand, if a series of experiments were to be performed at input level xq, then 
we would probably want to estimate a + /3xq, the mean yield.) 

Suppose first that we are interested in a single value (as opposed to an interval) to use 
as a predictor of Y(xq), the response at level xq. Now, it is clear that the best predictor of 
Y(xo) is its mean value a + Bxq. [Actually, this is not so immediately obvious since one 
could argue that the best predictor of a random variable is (1) its mean — which minimizes 
the expected square of the difference between the predictor and the actual value; or (2) its 
median — which minimizes the expected absolute difference between the predictor and 
the actual value; or (3) its mode — which is the most likely value to occur. However, as the 
mean, median, and mode of a normal random variable are all equal — and the response is, 
by assumption, normally distributed — there is no doubt in this situation.] Since a and B 
are not known, it seems reasonable to use their estimators A and B and thus use A + Bxq 
as the predictor of a new response at input level xq. 

Let us now suppose that rather than being concerned with determining a single value 
to predict a response, we are interested in finding a prediction interval that, with a given 
degree of confidence, will contain the response. To obtain such an interval, let Y denote 
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the future response whose input level is xo and consider the probability distribution of the 
response minus its predicted value — that is, the distribution of Y — A — Bxq. Now, 



Y ~ N{a + Bx , a 2 ) 
and, as was shown in Section 9.4.3, 

1 , (xo - x) 2 



A + Bxq ~ M I a + Bxo, a 



S 



Hence, because Y is independent of the earlier data values Y\, Yi, . . . , Y„ that were used 
to determine A and B, it follows that Y is independent of A + Bxq and so 



Y - A - Bxq ~ M ( 0, a 



or, equivalently, 



Y -A-Bxo 



' n + 1 (xq — x) 2 



' 1 (xo -x) 2 



M(o, l) 



(9.4.6) 



Now, using once again the result that SSr is independent of A and B (and also of Y) and 

SSr ^ 2 

we obtain, by the usual argument, upon replacing a in Equation 9.4.6 by its estimator 
SSrI{h — 2) that 



Y -A-Bxq 



tn-l 



n + 1 (xq — x) \ SSr 



V n s* 

and so, for any value a, < a. < 1, 



n — 2 



P\ 



-tal2,n-2 < 



Y -A- Bxq 



< t a /2,n-2 



' n + 1 (xo — x) / 55^ 



= 1 — # 



That is, we have just established the following. 
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Prediction Interval for a Response at the Input Level xo 

Based on the response values Y\ corresponding to the input values Xj, i = 1,2,..., n: 
With 100(1 — a) percent confidence, the response Y at the input level xo will be contained 
in the interval 



A + Bx ± t a/2 ,, 



n + 1 (xo 



S 



ss R 



EXAMPLE 9.4f In Example 9.4c, suppose we want an interval that we can "be 95 percent 
certain" will contain the height of a given male whose father is 68 inches tall. A simple 
computation now yields the prediction interval 

Y(68) e 67.568 ±1.050 

or, with 95 percent confidence, the person's height will be between 66.518 and 
68.618. ■ 

REMARKS 

(a) There is often some confusion about the difference between a confidence and a pre- 
diction interval. A confidence interval is an interval that does contain, with a given degree 
of confidence, a fixed parameter of interest. A prediction interval, on the other hand, is an 
interval that will contain, again with a given degree of confidence, a random variable of 
interest. 

(b) One should not make predictions about responses at input levels that are far from 
those used to obtain the estimated regression line. For instance, the data of Example 9.4c 
should not be used to predict the height of a male whose father is 42 inches tall. 

9.4.5 Summary of Distributional Results 

We now summarize the distributional results of this section. 

Model: Y = a + fix + e, e~ M(0, a 2 ) 
Data: (xi, Y{), i = 1,2, ... ,n 



Inferences About 



Use the Distributional Result 



(n — 2)Sxx 



SS r 



(JB-P)~ t„- 2 



n(n - 2)S xi 

J^xfSSjt 

N » 



(4-0t)~ t„- 2 
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Inferences About Use the Distributional Result 



a + Bxq 



Y(x ) 



A + Bxq — a — Bxq 



l (*o - x ) 2 



Sxx I V n — 2 

Y(xq)-A-Bx 



SS R 



1 + 1 + (x ° ~ x)2 



f n-l 



SSr 

n — 1 



tn-2 



9.5 THE COEFFICIENT OF DETERMINATION AND THE 
SAMPLE CORRELATION COEFFICIENT 

Suppose we wanted to measure the amount of variation in the set of response values 
Y\,...,Y n corresponding to the set of input values x\,...,x n . A standard measure in 
statistics of the amount of variation in a set of values Y\, . . . , Y n is given by the quantity 

n 

Syy = J2( y i ~ Y ) 2 

2=1 

For instance, if all the Y{ are equal — and thus are all equal to 7 — then Syy would 
equal 0. 

The variation in the values of the F, arises from two factors. First, because the input 
values Xi are different, the response variables F, all have different mean values, which will 
result in some variation in their values. Second, the variation also arises from the fact 
that even when the differences in the input values are taken into account, each of the 
response variables F ; - has variance a and thus will not exactly equal the predicted value 
at its input X;. 

Let us consider now the question as to how much of the variation in the values of the 
response variables is due to the different input values, and how much is due to the inherent 
variance of the responses even when the input values are taken into account. To answer 
this question, note that the quantity 



SS R = Y}?i-A-Bx L 



measures the remaining amount of variation in the response values after the different input 

values have been taken into account. 

Thus, 



Syy - SSr 
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represents the amount of variation in the response variables that is explainedhy the different 
input values; and so the quantity R defined by 

2 Syy - SSr 
Syy 
^ { _SSr 
Syy 

represents the proportion of the variation in the response variables that is explained by the 
different input values. R 2 is called the coefficient of determination. 

The coefficient of determination R 2 will have a value between and 1 . A value of R 2 
near 1 indicates that most of the variation of the response data is explained by the different 
input values, whereas a value of R 2 near indicates that little of the variation is explained 
by the different input values. 

EXAMPLE 9.5a In Example 9.4c, which relates the height of a son to that of his father, the 
output from Program 9.2 yielded that 

Syy = 38.521, SSr = 1.497 

Thus, 

38.531 

In other words, 96 percent of the variation of the heights of the 10 individuals is explained 
by the heights of their fathers. The remaining (unexplained) 4 percent of the variation is 
due to the variance of a son's height even when the father's height is taken into account. 
(That is, it is due to o , the variance of the error random variable.) ■ 

The value of R is often used as an indicator of how well the regression model fits the 
data, with a value near 1 indicating a good fit, and one near indicating a poor fit. In 
other words, if the regression model is able to explain most of the variation in the response 
data, then it is considered to fit the data well. 

Recall that in Section 2.6 we defined the sample correlation coefficient r of the set of 
data pairs (#;, Yi), i = 1, ...,», by 

Jt( Xi -x)(Yi-Y) 



H^-^HiYi-Y) 2 

2=1 i=l 



It was noted that r provided a measure of the degree to which high values of x are 
paired with high values of Y and low values of x with low values of Y. A value of r 
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near +1 indicated that large x values were strongly associated with large Y values and 
small x values were strongly associated with small Y values, whereas a value near — 1 indi- 
cated that large x values were strongly associated with small Y values and small x values 
with large Y values. 

In the notation of this chapter, 



Upon using identity (9.3. z 



we see that 



\SxxSw 



ss _ S xx Syy ~ S xY 

^>xx 



s 2 

r 2 - xY 



SxxSyy 
S xx Syy - SSrS x . 



= 1 



= R 2 



S xx Syy 

SSr 

Syy 



That is, 



Jr 2 



and so, except for its sign indicating whether it is positive or negative, the sample correla- 
tion coefficient is equal to the square root of the coefficient of determination. The sign of 
r is the same as that of B. 

The above gives additional meaning to the sample correlation coefficient. For instance, 
if a data set has its sample correlation coefficient r equal to .9, then this implies that a 
simple linear regression model for these data explains 81 percent (since R = .9 = .81) 
of the variation in the response values. That is, 81 percent of the variation in the response 
values is explained by the different input values. 

9.6 ANALYSIS OF RESIDUALS: ASSESSING THE MODEL 

The initial step for ascertaining whether or not the simple linear regression model 
Y = a + fix + e, e ~ JV(0, o 2 ) 
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is appropriate in a given situation is to investigate the scatter diagram. Indeed, this is 
often sufficient to convince one that the regression model is or is not correct. When the 
scatter diagram does not by itself rule out the preceding model, then the least square 
estimators A and B should be computed and the residual K, — (A + Bxi), i = 1, . . . , n 
analyzed. The analysis begins by normalizing, or standardizing, the residuals by dividing 
them by ^SSjf/(n — 2), the estimate of the standard deviation of the F,. The resulting 
quantities 



Yj-{A + Bxi) 
s/SS R l{n - 2) 



*= 1, 



are called the standardized residuals. 

When the simple linear regression model is correct, the standardized residuals are 
approximately independent standard normal random variables, and thus should be ran- 
domly distributed about with about 95 percent of their values being between —2 and 
+2 (since P{— 1.96 < Z < 1.96} = .95). In addition, a plot of the standardized residuals 
should not indicate any distinct pattern. Indeed, any indication of a distinct pattern should 
make one suspicious about the validity of the assumed simple linear regression model. 

Figure 9.9 presents three different scatter diagrams and their associated standardized 
residuals. The first of these, as indicated both by its scatter diagram and the random nature 
of its standardized residuals, appears to fit the straight-line model quite well. The second 
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380 



Chapter 9: Regression 



(b) Random data and regression line 
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FIGURE 9.9 (continued) 
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residual plot shows a discernible pattern, in that the residuals appear to be first decreasing 
and then increasing as the input level increases. This often means that higher-order (than 
just linear) terms are needed to describe the relationship between the input and response. 
Indeed, this is also indicated by the scatter diagram in this case. The third standardized 
residual plot also shows a pattern, in that the absolute value of the residuals, and thus their 
squares, appear to be increasing, as the input level increases. This often indicates that the 
variance of the response is not constant but, rather, increases with the input level. 
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9.7 TRANSFORMING TO LINEARITY 

In many situations, it is clear that the mean response is not a linear function of the input 
level. In such cases, if the form of the relationship can be determined it is sometimes 
possible, by a change of variables, to transform it into a linear form. For instance, in 
certain applications it is known that W{t), the amplitude of a signal a time t after its 
origination, is approximately related to t by the functional form 

W{i) « ce~ dt 

On taking logarithms, this can be expressed as 

log W(t) «s logc - dt 

If we now let 

Y = log W{t) 
a = log c 
P = -d 

then the foregoing can be modeled as a regression of the form 

Y = a + fit + e 

The regression parameters a and /3 would then be estimated by the usual least squares 
approach and the original functional relationships can be predicted from 



W(t) « e 



A+Bt 



EXAMPLE 9.7a The following table gives the percentages of a chemical that were used up 
when an experiment was run at various temperatures (in degrees Celsius). Use it to estimate 
the percentage of the chemical that would be used up if the experiment were to be run at 
350 degrees. 



Temperature Percentage 

5° .061 

10° .113 

20° .192 

30° .259 

40° .339 

50° .401 

60° .461 

80° .551 
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FIGURE 9.I0 Example9.7a. 



SOLUTION Let P(x) be the percentage of the chemical that is used up when the experiment 
is run at lOx degrees. Even though a plot of P(x) looks roughly linear (see Figure 9.10), 
we can improve upon the fit by considering a nonlinear relationship between x and P(x). 
Specifically, let us consider a relationship of the form 

1 - Pipe) fa c (l - if 

That is, let us suppose that the percentage of the chemical that survives an experiment run 
at temperature x approximately decreases at an exponential rate when x increases. Taking 
logs, the preceding can be written as 



log(l - />(*)) « log(c) + xlog(l - d) 



Thus, setting 



Y = -log(l - P) 
a = — logc 

p = -iogd - d) 

we obtain the usual regression equation 

Y = a + Bx + e 
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TABLE 9.2 



Temperature 



-iog(i - P) 



FIGURE 9.11 



5° 

10° 
20° 
30° 
40° 
50° 
60° 
80° 



Transformed data 



.063 
.120 
.213 
.300 
.414 
.512 
.618 
.801 




To see whether the data support this model, we can plot — log(l — P) versus x. The 
transformed data are presented in Table 9.2 and the graph in Figure 9.1 1. 
Running Program 9.2 yields that the least square estimates of a and B are 



A = .0154 
B = .0099 
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TABLE 9.3 



P-P 



5 


.061 


.063 


-.002 


10 


.113 


.109 


.040 


20 


.192 


.193 


-.001 


30 


.259 


.269 


-.010 


40 


.339 


.339 


.000 


50 


.401 


.401 


.000 


60 


.461 


.458 


.003 


80 


.551 


.556 


-.005 



Transforming this back into the original variable gives that the estimates of c and d are 

c = e~ A = .9847 
1 -d = e~ B = .9901 

and so the estimated functional relationship is 

p = l - .9847(.9901) x 

The residuals P — P are presented in Table 9.3. M 

9.8 WEIGHTED LEAST SQUARES 

In the regression model 

Y = a + Px + e 

it often turns out that the variance of a response is not constant but rather depends on its 
input level. If these variances are known — at least up to a proportionality constant — 
then the regression parameters a and fi should be estimated by minimizing a weighted 
sum of squares. Specifically, if 

a 2 
Var(^) = — 

Wi 

then the estimators A and B should be chosen to minimize 

[Yi-iA + Bxdf 1 ^ 2 
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On taking partial derivatives with respect to A and B and setting them equal to 0, we 
obtain the following equations for the minimizing A and B. 

^WiYi = A^w t + B^2 WiXi (9.8.1) 

i / i 

/, WjXi Y; =A22 mxi + b 2J mxf 

i / / 

These equations are easily solved to yield the least squares estimators. 

EXAMPLE 9.8a To develop a feel as to why the estimators should be obtained by mini- 
mizing the weighted sum of squares rather than the ordinary sum of squares, consider 
the following situation. Suppose that X\,...,X n are independent normal random vari- 
ables each having mean /U, and variance a 2 . Suppose further that the X; are not directly 
observable but rather only Y\ and Y 2 , defined by 

Y l =X l + ---+X h Y 2 = X k+1 +...+X,, k < n 

are directly observable. Based on Y\ and Y 2 , how should we estimate //? 

Whereas the best estimator of fi is clearly X = ^2" = iXi/n = (Y\ + Y%)ln, let us see 
what the ordinary least squares estimator would be. Since 

E[Yi] = kfi, E[Y 2 ] = (n-k)n 

the least squares estimator of [X would be that value of fi that minimizes 

(Zl - kixf + (Y 2 -in- k\ix? 

On differentiating and setting equal to zero, we see that the least squares estimator of 
/u. — call it /x — is such that 

-2k(Y l - kjx) - 2(» - k)[Y 2 - in - k)jx\ = 

or 

[k 2 + (» - k) 2 \\x = kY x + (» - k)Y 2 
or 

„ _ kYi+(n-k)Y 2 

M ~ k 2 + (n - k) 2 

Thus we see that while the ordinary least squares estimator is an unbiased estimator of 
ix — since 

r kE[Yi\ + (n-k)E[Y 2 \ k 2 fi + (n - kfji 

W k 2 + (n-k) 2 k 2 + {n-k) 2 M 

it is not the best estimator X. 
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Now let us determine the estimator produced by minimizing the weighted sum of 
squares. That is, let us determine the value of /x — call it fi w — that minimizes 

{Jx-kjxf [Y 2 -(n-k)n] 2 
Var(Fi) Var(F 2 ) 

Since 

Var(Fi) = ka 2 , Var(K 2 ) = {n - k)a 2 

this is equivalent to choosing fi to minimize 

(Y.-kn) 2 | [Y 2 - (n - k)n\ 2 
k n — k 

Upon differentiating and then equating to 0, we see that fx w , the minimizing value, satisfies 

-2k(Yi - kfi w ) _ 2{n- k)[Y 2 - (n- k)n w ] _ 
k n — k 

or 

Y\ + Y 2 = nfi w 

or 

_ Yi + Y 2 
n 

That is, the weighted least squares estimator is indeed the preferred estimator 

{Y l + Y 2 )ln=X. M 

REMARKS 

(a) Assuming normally distributed data, the weighted least squares estimators are precisely 
the maximum likelihood estimators. This follows because the joint density of the data 
Y\,...,Y» is 



n 1 

fY u ...,Y„(yu ■ ■ ■ ,y n ) = ]~[ / _ — - e 



-{y;-a-p Xi ) 2 /(2a 2 /w;) 



v 2n{ol \fwi) 



JW X ...W n yn_, Wi{yi _ a _ 



_ -f} Xi ) 2 l2a 2 
(2jt) nl2 (J n 



Consequently, the maximum likelihood estimators of a and fi are precisely the values of 
a and fi that minimize the weighted sum of squares X^=i w iiyi ~ a ~ P x i) ■ 
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(b) The weighted sum of squares can also be seen as the relevant quantity to be minimized 
by multiplying the regression equation 

Y = a + fix + e 

by «Jw. This results in the equation 

Y \[w = a^fw + Px^fw + e*fw 

Now, in this latter equation the error term e^/w has mean and constant variance. 
Hence, the natural least squares estimators of a and ft would be the values of A and B that 
minimize 



/ (Yj+fwj — A^pwi — Bxi^fuii) = y^ u>i{Yi —A — Bxi, 



i 



(c) The weighted least squares approach puts the greatest emphasis on those data pairs 
having the greatest weights (and thus the smallest variance in their error term). ■ 

At this point it might appear that the weighted least squares approach is not particularly 
useful since it requires a knowledge, up to a constant, of the variance of a response at an 
arbitrary input level. However, by analyzing the model that generates the data, it is often 
possible to determine these values. This will be indicated by the following two examples. 

EXAMPLE 9.8b The following data represent travel times in a downtown area of a certain 
city. The independent, or input, variable is the distance to be traveled. 



Distance (miles) 


.5 


1 


1.5 


2 


3 


4 


5 


6 


8 


10 


Travel time (minutes) 


15.0 


15.1 


16.5 


19.9 


27.7 


29.7 


26.7 


35.9 


42 


49.4 



Assuming a linear relationship of the form 

Y = a + fix + e 

between Y, the travel time, and x, the distance, how should we estimate a and /? ? To utilize 
the weighted least squares approach we need to know, up to a multiplicative constant, the 
variance of Y as a function of x. We will now present an argument that Var(K) should be 
proportional to x. 

SOLUTION Let d denote the length of a city block. Thus a trip of distance x will consist of 
xld blocks. If we let Yi, i = 1, . . . , xld, denote the time it takes to traverse block i, then 
the total travel time can be expressed as 

Y = Y x + Y 2 + ■ ■ ■ + Y xld 
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Now in many applications it is probably reasonable to suppose that the Y{ are 
independent random variables with a common variance, and thus, 

Var(K)=Var(F 1 ) + ---+Var(K cW ) 

= (x/«0Var(7i) since Var(^) = Var(Fi) 
= xa , where a = Var(Fi)/^ 

Thus, it would seem that the estimators A and B should be chosen so as to minimize 

(K-A-Bxi) 1 



£ 



Using the preceding data with the weights u>i = 1/xy, the least squares Equations 9.8.1 are 

104.22 = 5.344+ 10B 
277.9= 10A + 41B 

which yield the solution 

,4=12.561, 5 = 3.714 

A graph of the estimated regression line 12. 561 + 3. 7l4x along with the data points is 
presented in Figure 9.12. As a qualitative check of our solution, note that the regression 
line fits the data pairs best when the input levels are small, which is as it should be since 
the weights are inversely proportional to the inputs. ■ 

EXAMPLE 9.8c Consider the relationship between Y, the number of accidents on a heavily 
traveled highway, and x, the number of cars traveling on the highway. After a little thought 
it would probably seem to most that the linear model 

Y = a + fix + e 

would be appropriate. However, as there does not appear to be any a priori reason why 
Var(F) should not depend on the input level x, it is not clear that we would be justified in 
using the ordinary least squares approach to estimate a and ft. Indeed, we will now argue 
that a weighted least squares approach with weights 1/x should be employed — that is, we 
should choose A and B to minimize 



E 



(Y-A-Bx,) 

Xi 
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4 6 

Distance (miles) 



FIGURE 9.12 Example 9.8b. 



The rationale behind this claim is that it seems reasonable to suppose that Y has 
approximately a Poisson distribution. This is so since we can imagine that each of the x 
cars will have a small probability of causing an accident and so, for large x, the number 
of accidents should be approximately a Poisson random variable. Since the variance of 
a Poisson random variable is equal to its mean, we see that 

Var(K) ~ E[Y] since Y is approximately Poisson 
= a + fix 
— fix for large x M 



REMARKS 

(a) Another technique that is often employed when the variance of the response depends 
on the input level is to attempt to stabilize the variance by an appropriate transformation. 
For example, if Y is a Poisson random variable with mean A, then it can be shown [see 
Remark (b)] that y/Y has approximate variance .25 no matter what the value of A. Based 
on this fact, one might try to model E[-/Y] as a linear function of the input. That is, one 
might consider the model 

4Y = a + fix + e 
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(b) Proof that VarVT & .25 when Y is Poisson with mean X. Consider the Taylor series 
expansion of g(y) = Jy about the value X. By ignoring all terms beyond the second 
derivative term, we obtain that 

giy) « gW + g'{X){y -X) + g {X){ { - A) (9.8.2) 



Since 



Ax) = \x-^, g"{x) = -\x~^ 



we obtain, on evaluating Equation 9.8.2 at y = Y, that 

V¥ % VX + \X~ m {Y -X)- \X~ m {Y - X) 2 
Taking expectations, and using the results that 



yields that 



He 



and ; 



E[Y-X] = 0, E[{Y -X) 2 ]=Vai(Y) = X 



E[VY] ^Vx 

8VX 



(E[VY]f « X + -L - \ 
GAX 4 

1 

^ X — — 
4 



Var(VF) = E[Y] - (^[v^]) 2 

_ 1 

~ 4 
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9.9 POLYNOMIAL REGRESSION 

In situations where the functional relationship between the response Y and the independent 
variable x cannot be adequately approximated by a linear relationship, it is sometimes 
possible to obtain a reasonable fit by considering a polynomial relationship. That is, we 
might try to fit to the data set a functional relationship of the form 

Y = A) + Pix + fox 2 + ■■■ + P,x r + e 

where /3o, P\, ■ ■ ■ , Pr are regression coefficients that would have to be estimated. If the 
data set consists of the n pairs (xi, Yi), i = 1, . . . , n, then the least square estimators of 
Po,. . . , p r — call them Bo, . . . , B r — are those values that minimize 



J^iY, -B - B lXi - B 2 xf Brxlf 



To determine these estimators, we take partial derivatives with respect to Bq . . . B r 
of the foregoing sum of squares, and then set these equal to so as to determine the 
minimizing values. On doing so, and then rearranging the resulting equations, we obtain 
that the least square estimators B$,B\,. . . ,B r satisfy the following set of r + 1 linear 
equations called the normal equations. 



J2 Y ' = B ° n + 5 i J2 x ' + 5 2 J2 x ? + ---+ B >-J2 x ' 

2=1 i=\ i=\ i=\ 

n n n n n 

Y,*i Y i = BoJ2 x * + s i & 2 + B2 E x f + • ■ • + B r & r+1 

n n n n 

J>?^ = B J2*f + * J2 x f + ■ ■ ■ + B r E< 



.r+2 

r 

i=\ i=\ i=\ i=\ 



Y j xiY l = BoY J x 'i+ B ^T J x 'i +l + ---+ B rY, 



x} r 



i=\ i=\ 



In fitting a polynomial to a set of data pairs, it is often possible to determine the necessary 
degree of the polynomial by a study of the scatter diagram. We emphasize that one should 
always use the lowest possible degree that appears to adequately describe the data. [Thus, 
for instance, whereas it is usually possible to find a polynomial of degree n that passes 
through all the n pairs (#», Yi), i = 1, ...,«, it would be hard to ascribe much confidence 
to such a fit.] 
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Even more so than in linear regression, it is extremely risky to use a polynomial fit to 
predict the value of a response at an input level xq that is far away from the input levels 
Xi, i = 1, . . . , n used in finding the polynomial fit. (For one thing, the polynomial fit 
may be valid only in a region around the Xi,i = 1, . . . , n and not including xq) 

EXAMPLE 9.9a Fit a polynomial to the following data. 



X 


Y 


1 


20.6 


2 


30.8 


3 


55 


4 


71.4 


5 


97.3 


6 


131.8 


7 


156.3 


8 


197.3 


9 


238.7 





291.7 



SOLUTION A plot of these data (see Figure 9.13) indicates that a quadratic relationship 



Y = fa + Pix + fax 1 + e 
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might hold. Since 

J2 Xi = 55, J2 x ? = 385 ' S *? = 3 ' 025 ' S *i = 25 ' 333 

i i i i 

J2 Y i= 1.291.1, J]*'^' = 9,549.3, J]x« 2 ^ = 77,758.9 

i i i 

the least squares estimates are the solution of the following set of equations. 

1,291.1 = 105 + 55£i+ 3855 2 (9-9.1) 

9,549.3 = 55B + 3855i + 3,025B 2 
77,758.9 = 385B + 3,025Si + 25,333S 2 

Solving these equations (see the remark following this example) yields that the least 
squares estimates are 

B = 12.59326, B\ = 6.326172, B 2 = 2.122818 

Thus, the estimated quadratic regression equation is 

Y = 12.59 + 6.33* + 2. 12x 2 

This equation, along with the data, is plotted in Figure 9. 14. ■ 
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REMARK 

In matrix notation Equation 9.9.1 can be written as 



1,291.1 




10 


55 


385 


Bo 


9,549.3 


= 


55 


385 


3,025 


Bi 


77,758.9 




385 


3,025 


25,333 


B 2 



which has the solution 











10 


55 385 




1,291.1 




= 


55 


385 3,025 




9,549.3 


2 




385 


3,025 25,333 




77,758.9 



*9.I0 MULTIPLE LINEAR REGRESSION 

In the majority of applications, the response of an experiment can be predicted more 
adequately not on the basis of a single independent input variable but on a collection of 
such variables. Indeed, a typical situation is one in which there are a set of, say, k input 
variables and the response Y is related to them by the relation 



Y = B + pm + ■■■ + 



+ e 



where Xj,j = 1, . . . , k is the level of they th input variable and e is a random error that 
we shall assume is normally distributed with mean and (constant) variance a . The 
parameters Bq, B\, . . . , B^ and a are assumed to be unknown and must be estimated 
from the data, which we shall suppose will consist of the values of Y\ , . . . , Y n where Yi is 
the response level corresponding to the k input levels xn, . . . , Xii, . . . , x^. That is, the Y{ 
are related to these input levels through 

E[Yj] = fio + Bixn + foxa H h B k xik 

If we let Bq, B\,...,Bfi denote estimators of Bq, . . . , By then the sum of the squared 
differences between the Yi and their estimated expected values is 



Y.iXi -B - B x x n - B 2 x i2 



Bk x ikY 



7=1 



The least squares estimators are those values of Bq, B\,...,Bk that minimize the foregoing. 



* Optional section. 
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To determine the least squares estimators, we repeatedly take partial derivatives of the 
preceding sum of squares first with respect to Bo, then to B\, . . . , then to B^. On equating 
these k + 1 equations to 0, we obtain the following set of equations: 



J2(Yi -B - Bixn - B 2 x l2 

i=\ 

n 

}Xj\{Yj -Bq - Bixn 

i=\ 

n 

/xnJYj - Bo - Bixn 



B k x lk ) = 
B k x,k) = 
BkXik) = 



i=\ 



^2,Xik{Yi -Bo- B\xn BjXik) = 



;'=1 



Rewriting these equations yields that the least squares estimators Bq,B\,. . . ,B(, satisfy 
the following set of linear equations, called the normal equations: 



/, Yi = nB + B\ 2J xn + B 2 2_, **2 H r- Bk 2_, x<* 



(9.10.1) 



;'=1 



i=\ 



«=1 



yi ^i f, = b x *« + s i XI x ' 2 i + Bi X xnx/2 ^ — i- ^ y^ *n 



■^/X' 



/=! 



i=\ 



i=\ 



i=\ 



X ***^» = B ° X *** + Sl X *^ X ' 1 + S2 X Xii,Xi2 "^ VB k^ l x\ 



i=\ 



i=\ 



;'=1 



;'=1 



Before solving the normal equations, it is convenient to introduce matrix notation. If 

we let 



Y = 



Yi 




"1 


x\\ 


X\2 ■ 


■ x\k 


Y 2 


x = 


1 


x 2 \ 


x 22 ■ 


■ x 2k 


Y„ 




1 


x n \ 


x n2 


X n k 
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"A>" 



then Y is an n x 1, X an n x p, /3 ap x 1, and eansx 1 matrix where/* = k + 1. 
The multiple regression model can now be written as 



Y = X/? + e 



In addition, if we let 



B = 



~B 
B k 



be the matrix of least squares estimators, then the normal Equations 9.10.1 can be written as 

X'XB=X'Y (9.10.2) 

where X' is the transpose of X 

To see that Equation 9.10.2 is equivalent to the normal Equations 9.10.1, note that 



XX 



1 1 ■■• 1 

Xll X21 ■ ■ ' X„\ 
X\2 *22 ■ ■ ' X„2 



1 X\\ X\2 ••• X\k 
1 X2\ X22 • • • X2k 



1 X n \ X n 2 



X\k X2k ' ' ' X n fc 

n J2 x tt J2 x i2 ■■■ H x ik 

i i i 



Y. x ik H x ik x i\ H x ik x i2 ■■■ H x \ 



X n k 



and 



XY = 



i 



7 , %ik *-i 
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It is now easy to see that the matrix equation 

X'XB = X'Y 

is equivalent to the set of normal Equations 9.10.1. Assuming that (X'X) exists, which 
is usually the case, we obtain, upon multiplying it by both sides of the foregoing, that 
the least squares estimators are given by 

B = (X'X) _1 X / Y (9.10.3) 

Program 9.10 computes the least squares estimates, the inverse matrix (X'X) , 
and SSr. 

EXAMPLE 9. 1 0a The data in Table 9.4 relate the suicide rate to the population size and the 
divorce rate at eight different locations. 

TABLE 9.4 





Population 


Divorce Rate 


Suicide Rate 


Location 


in Thousands 


per 


100,000 


per 


100,000 


Akron, Ohio 


679 




30.4 




11.6 


Anaheim, Ca. 


1,420 




34.1 




16.1 


Buffalo, N.Y. 


1,349 




17.2 




9.3 


Austin, Texas 


296 




26.8 




9.1 


Chicago, IL. 


6,975 




29.1 




8.4 


Columbia, S.C. 


323 




18.7 




7.7 


Detroit, Mich. 


4,200 




32.6 




11.3 


Gary, Indiana 


633 




32.5 




8.4 



Fit a multiple linear regression model to these data. That is, fit a model of the form 

Y = p + faxi + p 2 x 2 + e 

where Y is the suicide rate, x\ is the population, and x 2 is the divorce rate. 

SOLUTION We run Program 9.10, and results are shown in Figures 9.15, 9.16, and 9.17. 
Thus the estimated regression line is 

Y = 3.5073 - .0002xi + .2609*2 

The value of fi\ indicates that the population does not play a major role in predicting the 
suicide rate (at least when the divorce rate is also given). Perhaps the population density, 
rather than the actual population, would have been more useful. ■ 
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Multiple Linear Regression i 


^ 






Enter the number of rows 8 
of the X-matrix: 


Begin Data Entry 




Enter the number of columns 3 
of the X-matrix: 








Quit 











FIGURE 9.15 



It follows from Equation 9.10.3 that the least squares estimators Bq,B\,. .. ,B^ — 
the elements of the matrix B — are all linear combinations of the independent normal 
random variables Y\ , . . . , Y„ and so will also be normally distributed. Indeed in such 
a situation — namely, when each member of a set of random variables can be expressed 
as a linear combination of independent normal random variables — we say that the set of 
random variables has a joint multivariate normal distribution. 

The least squares estimators turn out to be unbiased. This can be shown as follows: 

E[B] = ^[(X'Xr'X'Y] 

= E[{X'Xy i X'{Xp + e)] sinceY=X/? + e 
= ^[(X'Xr'X'X/? + (X'Xr'X'e] 
= E[P + (X'Xy 1 X'e] 
= /3 + (X'X)- 1 X'E[e] 
= P 
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A 


B 


C 




* 
* 


1 


1 


679 


304 


2 


1 


1420 


34.1 


3 


1 


1349 


17.2 


4 


1 


296 


26.8 


5 


1 


6975 


29.1 


6 


1 


323 


18.7 



Compute Inverse 



Back 1 Step 



FIGURE 9.I6 



The variances of the least squares estimators can be obtained from the matrix (X'X) . 
Indeed, the values of this matrix are related to the covariances of the Bis. Specifically, the 
element in the (z + l)st row, (j + l)st column of (X'X) - is equal to Cov(Bj, Bj)lo . 
To verify the preceding statement concerning Cov(S,, Bj), let 

C = (X'X) _1 X' 



Since X is an n x p matrix and X' apxn matrix, it follows that X'X isp x p, as is (X'X) ~ , 
and so C will be a p x n matrix. Let Cij denote the element in row i, column j of this 
matrix. Now 



£o 



"Qi 



C\ t 



"If 



B, 



= B = CY = 



Cn 



r. 



a 



■ P \ 



L/i, 



Y„ 
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Enter 8 response values: 



8.4 



Response Values 



Add This Value To List 



Remove Selected Value From List 



11.6 
16.1 


* 


9.3 




9.1 




8.4 




7.7 


* 



Compute coeffs. 



Back 1 Step 



Estimates of the 
regression coefficients: 



Inverse Matrix (X'X)-1 



B(0) = 3.5073534 
B(1) = -0.0002477 
B(2) = 0.2609466 



Display Inverse 



Interval Estimates 



2.78312 [0.00002 i-9.73E-0 


* 
* 


0.00002 I2.70E-08 J-2.55E-0 


-9.73E-02 !-2.55E-'06 10.0037 








-1 1 k 





The sum of the squares of the residuals is SS R = 34.1212 



FIGURE 9. 1 7 



and ; 



Hence 



n 

Bj-\ = 2_^ Cj r Y r 



CtrtJBi-u Bj-x) = Cov I J^ CM, J2 CjrYr 

\/=l r=\ J 

n n 

= ^J]Q/C 7 ,.Cov(K /) F r ) 

r=l 1=1 

Now Yi and Y r are independent when I ^ r, and so 

| if/ ^r 



Cov(F/, Y r ) 



Var(F r ) if/ = r 
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Since Var(F r ) = o , we see that 



Cov(fiy_ i , Bj-\) — a 2_^ Qr Cj r 

r=\ 

= a 2 (CC')ij 

where (CC')y is the element in row i, column^ of CC'. 

If we now let Cov(B) denote the matrix of covariances — that is, 



(9.10.4) 



Cov(B) 



'Cov(B ,B ) •■• Cov(fi 



Cov{B k ,B ) ■■■ Cov(B k ,B k ) 



then it follows from Equation 9.10.4 that 

Cov(B) = ct 2 CC' 



(9.10.5) 



Now 



C' = ((X'X) l X'\ 

= x((x'x)- 1 )' 



= X(X'X) ' 

where the last equality follows since (X'X) is symmetric (since X'X is) and so is equal 
to its transpose. Hence 

CC' = (X'X)- 1 X'X(X'X)- 1 
= (X'X)" 1 



and so we can conclude from Equation 9.10.5 that 

Cov(B) = o- 2 (X'X)" 1 



(9.10.6) 



Since Cov(Bi,Bf) = Var(5 z ), it follows that the variances of the least squares estimators 
are given by a multiplied by the diagonal elements of (X'X) . 
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The quantity a can be estimated by using the sum of squares of the residuals. That is, 
if we let 

n 
SSr = 2_j( Y > ~ B ° ~ BlXn ~ B 2Xi2 ~ BkXik) 2 



then it can be shown that 



and ; 



or 



2 ~ %n-(k+l) 



SSr 



n — k — 1 



E[SS R l(n -k-l)]=a 2 

That is, SSrI{h — k — 1) is an unbiased estimator of a > . In addition, as in the case 
of simple linear regression, SSr will be independent of the least squares estimators 
Bq,B\,. . . ,Bk- 

REMARK 

If we let ri denote the ith residual 

n = Yi — Bq — B\xa - • • • - BkXik, i = !,...,» 



then 



/her 



r = Y-XB 



H 



ence, we may write 



ss R = Y, r ? 



(9.10.7) 



= r r 

= (Y - XB)'(Y - XB) 
= [Y' - (XB)'](Y - XB) 
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= (Y'-B'X')(Y-XB) 

= Y'Y - Y'XB - B'X'Y + B'X'XB 

= Y'Y - Y'XB 

where the last equality follows from the normal equations 

X'XB = X'Y 

Because Y' is 1 x n, X is n x p, and B \s p x 1, it follows that Y'XB is a 1 x 1 matrix. 
That is, Y'XB is a scalar and thus is equal to its transpose, which shows that 

Y'XB = (Y'XB)' 
= B'X'Y 

Hence, using Equation 9.10.7 we have proven the following identity: 

SSr = Y'Y - B'X'Y 

The foregoing is a useful computational formula for SSr (though one must be careful 
of possible roundoff error when using it). 

EXAMPLE 9.10b For the data of Example 9.10a, we computed that SSr = 34.12. Since 
n - S,k = 2, the estimate of o 1 is 34.12/5 = 6.824. ■ 

EXAMPLE 9.10c The diameter of a tree at its breast height is influenced by many factors. 
The data in Table 9.5 relate the diameter of a particular type of eucalyptus tree to its age, 
average rainfall at its site, site's elevation, and the wood's mean specific gravity. (The data 
come from R. G. Skolmen, 1975, "Shrinkage and Specific Gravity Variation in Robusta 
Eucalyptus Wood Grown in Hawaii." USDA Forest Service PSW-298.) 
Assuming a linear regression model of the form 

Y = fio + /hxi + ^2^2 + P3X3 + /64X4 + e 

where x\ is the age, X2 is the elevation, x$ is the rainfall, X4 is the specific gravity, and Y is 
the tree's diameter, test the hypothesis that P2 = 0. That is, test the hypothesis that, given 
the other three factors, the elevation of the tree does not affect its diameter. 

SOLUTION To test this hypothesis, we begin by running Program 9.10, which yields, among 
other things, the following: 

(X'X)~j = .379, SS R = 19.262, B 2 = .075 
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TABLE 9.5 













Diameter 




Age 


Elevation 


Rainfall 


Specific 


at Breast Height 




(years) 


(1,000 ft) 


(inches) 


Gravity 


(inches) 


1 


44 


1.3 


250 


.63 


18.1 


2 


33 


2.2 


115 


.59 


19.6 


3 


33 


2.2 


75 


.56 


16.6 


4 


32 


2.6 


85 


.55 


16.4 


5 


34 


2.0 


100 


.54 


16.9 


6 


31 


1.8 


75 


.59 


17.0 


7 


33 


2.2 


85 


.56 


20.0 


8 


30 


3.6 


75 


.46 


16.6 


9 


34 


1.6 


225 


.63 


16.2 


10 


34 


1.5 


250 


.60 


18.5 


11 


33 


2.2 


255 


.63 


18.7 


12 


36 


1.7 


175 


.58 


19.4 


13 


33 


2.2 


75 


.55 


17.6 


14 


34 


1.3 


85 


.57 


18.3 


15 


37 


2.6 


90 


.62 


18.8 



It now follows from Equation 9.10.6 that 



Since 2?2 is normal and 



that 



Var(fi 2 ) = 379a 1 



E[B 2 ] = fo 



B2-P2 

-616CT 



N(0, 1) 



Replacing a by its estimator SSrI 10 transforms the foregoing standard normal distri- 
bution into a ^-distribution with 10(= n — k — 1) degrees of freedom. That is, 



B2-P2 
.616 V 'SS R I '10 



tio 



Hence, if P2 — then 



V 10/ SS R B 2 
^16 



tio 
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Since the value of the preceding statistic is (>/10/19.262)(.075)/.616 = .088, the Rvalue 
of the test of the hypothesis that 02 = is 

/>-value = J P{|rio| > .088} 
= 2P{T l0 > .088} 
= .9316 by Program 5.8.2.A 

Hence, the hypothesis is accepted (and, in fact, would be accepted at any significance level 
less than .9316). ■ 

REMARK 

The quantity 



R 2 = 1 



ss R 



EW - K) 2 



which measures the amount of reduction in the sum of squares of the residuals when using 
the model 

Y = Po + 01*1 + --- + P„x n + e 

as opposed to the model 

Y = O + e 

is called the coefficient of multiple determination. 

9.10.1 Predicting Future Responses 

Let us now suppose that a series of experiments is to be performed using the input levels 
x\,...,xif Based on our data, consisting of the prior responses Y\,...,Y n , suppose we 
would like to estimate the mean response. Since the mean response is 

E[Y\x] = 0o + 0i*i +--- + PkXi 

a point estimate of it is simply X!/=o ^i x > wnere #0 — 1- 

To determine a confidence interval estimator, we need the distribution of X^=o Bi x i- 
Because it can be expressed as a linear combination of the independent normal random 
variables Yj, i = 1, . . . , n, it follows that it is also normally distributed. Its mean and 
variance are obtained as follows: 



/, XiBj 



/=o 



k 
= 2_^ Xifii since E[Bi\ = Pi 



(9.10.8) 
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That is, it is an unbiased estimator. Also, using the fact that the variance of a random 
variable is equal to the covariance between that random variable and itself, we see that 



Var [ J2 X ' B ' = Gov J2 X ' B ''J1 X J B J 
,/=0 / \i=0 j=0 

k k 

= 1_j2-j x i x jC° v ( B i' B j- 
i-0 j=0 



(9.10.9) 



If we let x denote the matrix 

"~x 

x\ 

then, recalling that Cov(fiy, Bj)la is the element in the (i + l)st row and (j + l)st column 
of (X'X) , we can express Equation 9.10.9 as 



Var J2 X ' B ' = x'CX'X)-^^ 2 



(9.10.10) 



i=0 



Using Equations 9.10.8 and 9.10.10, we see that 



J2 XiBi - J2 XiBi 
i=o i—o 



o-Vx'fX'X)-^ 



N(0, 1) 



If we now replace er by its estimator -JSSrKji — k — 1) we obtain, by the usual argument, 
that 



J2 XiB; - J2 Xi&i 

;=0 J = 



SS R 



tn-k-i 



1) 



Vx'CX'X)- 1 ! 



which gives rise to the following confidence interval estimator of 2Z/=o x > Pi- 

Confidence Interval Estimate of E[Y\x\ = Yli=Q x iPi> ( x — ') 

A 100(1 — a) percent confidence interval estimate of X!;=o x z'/^' ls gi yen by 



yxjbj ± 



i=0 



SS r I : — 

—Vx'(X'X) _1 X t a /2,n-k-l 

(n — k — 1) 
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TABLE 9.6 







Annealing Temperature 


Hardness 


Copper Content 


(units of 1,000° F) 


79.2 


.02 


1.05 


64.0 


.03 


1.20 


55.7 


.03 


1.25 


56.3 


.04 


1.30 


58.6 


.10 


1.30 


84.3 


.15 


1.00 


70.4 


.15 


1.10 


61.3 


.09 


1.20 


51.3 


.13 


1.40 


49.8 


.09 


1.40 



where bo, . . . , b^ are the values of the least squares estimators Bq,B\,. . . , B^, and ss r is the 
value of SSr. 

EXAMPLE 9. 1 Od A steel company is planning to produce cold reduced sheet steel consisting 
of .15 percent copper at an annealing temperature of 1,150 (degrees F), and is interested 
in estimating the average (Rockwell 30-T) hardness of a sheet. To determine this, they 
have collected the data shown in Table 9.6 on 10 different specimens of sheet steel having 
different copper contents and annealing temperatures. Estimate the average hardness and 
determine an interval in which it will lie with 95 percent confidence. 

SOLUTION To solve this, we first run Program 9.10, which gives the results shown in 
Figures 9.18, 9.19, and 9.20. 

Hence, a point estimate of the expected hardness of sheets containing .15 percent 
copper at an annealing temperature of 1,150 is 69.862. In addition, since £.025,7 = 2.365, 
a 95 percent confidence interval for this value is 

69.862 ± 4.083 ■ 

When it is only a single experiment that is going to be performed at the input levels 
xi,..., Xfa we are usually more concerned with predicting the actual response than its 
mean value. That is, we are interested in utilizing our data set Y\ , . . . , Y„ to predict 



F(x) = y^ ?>i x i + e > where xq — 1 



A point prediction is given by X^=o ^' x ' wnere &i ls the least squares estimator of Rj based 
on the set of prior responses Y\,...,Y n ,i— l,...,k. 
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B 
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.02 


1.05 


2 


1 


.03 


1.20 


3 


1 


.03 


1.25 


4 


1 


.04 


1.30 


5 


1 


.10 


1.30 


6 


1 


.15 


1.00 



Compute Inverse 



Back 1 Step 



FIGURE 9. 1 8 



To determine a prediction interval for F(x), note first that since Bq, , , , , Bk axe based 
on prior responses, it follows that they are independent of K(x). Hence, it follows that 
Y(x) — X!;=o B> x i ls normal with mean and variance given by 



Var 



r( x )-X> 



;=0 



= Var[F(x)] +Var I / Bjxj I by independence 



»=0 



ct 2 + ctV(X'X)~ x x from Equation 9.10.10 



and ; 



F(x) - £ B iXi 
i=0 

oVl+x'CX'X)- 1 *: 



iV(0, 1) 
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Estimates of the 
regression coefficients: 



Inverse Matrix (X'X)-1 



B(0) = 160.2928774 
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The sum of the squares of the residuals is SS R = 66.6593 
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The value Sqr(X'(X'X) A -1x) 
£x(i)B(i) = 69.86226 
The value Sqr(SSr/(n - k - 1) 


= 0.55946 
) = 3.0859 
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which yields, upon replacing a by its estimator, that 



l n— k— 1 





F(x) - £ B iXi 
i-0 




1 VVd 


thus have: 


lJt^ l+ ^' x> -'" 



Prediction Interval for K(x) 

With 100(1 — a) percent confidence F(x) will lie between 



7,Xibj ± j — %/l+x'(X'X) ! x talXn-k-l 

^ y (n- k- I) 

where bo,. . . , b^, are the values of the least squares estimators Bq, B\, ... , B(,, and % is 
the value of SSr. 

EXAM PLE 9. 1 Oe If in Example 9. 1 Od we were interested in determining an interval in which 
a single steel sheet, produced with a carbon content of . 1 5 percent and at an annealing 
temperature of 1,150°F, would lie, then the midpoint of the prediction interval would 
be as given before. However, the half-length of this prediction interval would differ from 
the confidence interval for the mean value by the factor ^/Y3l3/^/3l5. That is, the 
95 percent prediction interval is 

69.862 ± 8.363 ■ 

9.1 1 LOGISTIC REGRESSION MODELS FOR BINARY 
OUTPUT DATA 

In this section we consider experiments that result in either a success or a failure. We will 
suppose that these experiments can be performed at various levels, and that an experiment 
performed at level x will result in a success with probability p(x), — oo < x < 00. If/>(x) 
is of the form 

a+bx 
p( X ) = — 

F 1 + e a+bx 

then the experiments are said to come from a logistic regression model and p(x) is called 
the logistics regression function. If b > 0, then^>(x) = l/[e~^ a+ ' + 1] is an increasing 
function that converges to 1 as x — >■ oo; if b < 0, then p{x) is a decreasing function that 
converges to as x — > oo. (When b = 0, p{x) is constant.) Plots of logistics regression 
functions are given in Figure 9.2 1 . Notice the s-shape of these curves. 
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FIGURE 9.21 Logistic regression functions. 



Writing p(x) = 1 — [1/(1 + e a+bx )] and differentiating gives that 



dx 



p(* 



be a+bx 



[l + e 



i+bx\2 



bp(x)[l -p(x)] 



Thus the rate of change of p(x) depends on x and is largest at those values of x for which 
p(x) is near .5. For instance, at the value x such that p(x) = .5, the rate of change is 
fap{x) = -25b, whereas at that value x for which p(x) = .8 the rate of change is A6b. 
If we let o(x) be the odds for success when the experiment is run at level x, then 



o(x) = 



p(x 



1 -p(* 



a+bx 



Thus, when b > 0, the odds increase exponentially in the input level x; when b < 0, the 
odds decrease exponentially in the input level x. Taking logs of the preceding shows the 
the log odds, called the logit, is a linear function: 

log[o(x)] = a + bx 



The parameters a and b of the logistic regression function are assumed to be unknown 
and need to be estimated. This can be accomplished by using the maximum likelihood 
approach. That is, suppose that the experiment is to be performed at levels x\,. . . ,Xf.- 
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Let Y; be the result (either 1 if a success, or if a failure) of the experiment when performed 
at level #,-. Then, using the Bernoulli density function (that is, the binomial density for 
a single trial), gives 

Thus, the probability that the experiment at level x, results in outcome yi, for all 
i = 1 , . . . , £, is 

^*+faj \^ / j \ l-y, 



I a+bxi y / 



_|_ e a+bxi 



-n\ 



_i_ e a+bxi 



Taking logarithms gives that 

k k 

log {P{Yi = y t , i=\,...,k}) = J2yi(* + hi)- J2 lo § ( ! + * -+fa> ) 

The maximum likelihood estimates can now be obtained by numerically finding the values 
of a and b that maximize the preceding likelihood. However, because the likelihood 
is nonlinear this requires an iterative approach; consequently, one typically resorts to 
specialized software to obtain the estimates. 

Whereas the logistic regression model is the most frequently used model when the 
response data are binary, other models are often employed. For instance in situations 
where it is reasonable to suppose that/>(x), the probability of a positive response when the 
input level is x, is an increasing function of x, it is often supposed that/>(x) has the form of 
a specified probability distribution function. Indeed, when b > 0, the logistic regression 
model is of this form because p(x) is equal to the distribution function of a logistic random 
variable (Section 5.9) with parameters fi = —alb, v = lib. Another model of this type is 
the probit model, which supposes that for some constants, a, fi > 

j pa+fix 

p{ x ) = 0(a + fix) = —= / e~y n dy 

In other words p(x) is equal to the probability that a standard normal random variable 
is less than a + fix. 

EXAMPLE 9.1 la A common assumption for whether an animal becomes sick when 
exposed to a chemical at dosage level x is to assume a threshold model, which supposes 
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that each animal has a random threshold and will become ill if the dosage level exceeds 
that threshold. The exponential distribution has sometimes been used as the threshold 
distribution. For instance, a model considered in Freedman and Zeisel ("From Mouse to 
Man: The Quantitative Assessment of Cancer Risks," Statistical Science, 1988, 3, 1, 3-56) 
supposes that a mouse exposed to x units of DDT (measured in ppm) will contract cancer 
of the liver with probability 

p{ x ) = 1 - e ~ ax , x > 

Because of the lack of memory of the exponential distribution, this is equivalent to assuming 
that if the mouse who is still healthy after receiving a (partial) dosage of level x is as good 
as it was before receiving any dosage. 

It was reported in Freedman and Zeisel that 84 of 1 1 1 mice exposed to DDT at a level 
of 250 ppm developed cancer. Therefore, a can be estimated from 



or 



111 



log(27/lll 

a = ^ = .005655 

250 



Problems 

1. The following data relate x, the moisture of a wet mix of a certain product, to Y , 
the density of the finished product. 



Xi 


Yi 


5 


7.4 


6 


9.3 


7 


10.6 


10 


15.4 


12 


18.1 


15 


22.2 


18 


24.1 


20 


24.8 



(a) Draw a scatter diagram. 

(b) Fit a linear curve to the data. 
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2. The following data relate the number of units of a good that were ordered as a 
function of the price of the good at six different locations. 



Number ordered 


88 


112 


123 


136 


158 


172 


Price 


50 


40 


35 


30 


20 


15 



How many units do you think would be ordered if the price were 25? 

3. The corrosion of a certain metallic substance has been studied in dry oxygen at 
500 degrees Centigrade. In this experiment, the gain in weight after various periods 
of exposure was used as a measure of the amount of oxygen that had reacted with 
the sample. Here are the data: 



Hours 


Percent Gain 


1.0 


.02 


2.0 


.03 


2.5 


.035 


3.0 


.042 


3.5 


.05 


4.0 


.054 



(a) Plot a scatter diagram. 

(b) Fit a linear relation. 

(c) Predict the percent weight gain when the metal is exposed for 3.2 hours. 

4. The following data indicate the relationship between x, the specific gravity of 
a wood sample, and Y, its maximum crushing strength in compression parallel to 
the grain. 



Xj 


yiipsi) 


Xf 


7*(p si ) 


.41 


1,850 


.39 


1,760 


.46 


2,620 


.41 


2,500 


.44 


2,340 


.44 


2,750 


.47 


2,690 


.43 


2,730 


.42 


2,160 


.44 


3,120 



(a) Plot a scatter diagram. Does a linear relationship seem reasonable? 

(b) Estimate the regression coefficients. 

(c) Predict the maximum crushing strength of a wood sample whose specific 
gravity is .43. 
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5. The following data indicate the gain in reading speed versus the number of weeks 
in the program of 10 students in a speed- reading program. 





Speed Gain 


Number of Weeks 


(wds/min) 


2 


21 


3 


42 


8 


102 


11 


130 


4 


52 


5 


57 


9 


105 


7 


85 


5 


62 


7 


90 



(a) Plot a scatter diagram to see if a linear relationship is indicated. 

(b) Find the least squares estimates of the regression coefficients. 

(c) Estimate the expected gain of a student who plans to take the program for 
7 weeks. 

6. Infrared spectroscopy is often used to determine the natural rubber content of 
mixtures of natural and synthetic rubber. For mixtures of known percentages, the 
infrared spectroscopy gave the following readings: 



Percentage 





20 


40 


60 


80 


100 


Reading 


.734 


.885 


1.050 


1.191 


1.314 


1.432 



If a new mixture gives an infrared spectroscopy reading of 1.15, estimate its 
percentage of natural rubber. 

7. The following table gives the 1996 SAT mean math and verbal scores in each 
state and the District of Columbia, along with the percentage of the states' gradu- 
ating high school students that took the examination. Use data relating to the 
first 20 locations listed (Alabama to Maine) to develop a prediction of the mean 
student mathematics score in terms of the percentage of students that take the 
examination. Then compare your predicted values for the next 5 states (based on 
the percentage taking the exam in these states) with the actual mean math scores. 



416 



Chapter 9: Regression 



SAT Mean Scores by State, 1996 (recentered scale) 



1996 




% Graduates 
Taking 


Verbal 


Math 


SAT 


565 


558 


8 


521 


513 


47 


525 


521 


28 


566 


550 


6 


495 


511 


45 


536 


538 


30 


507 


504 


79 


508 


495 


66 


489 


473 


50 


498 


496 


48 


484 


477 


63 


485 


510 


54 


543 


536 


15 


564 


575 


14 


494 


494 


57 


590 


600 


5 


579 


571 


9 


549 


544 


12 


559 


550 


9 


504 


498 


68 


507 


504 


64 


507 


504 


80 


557 


565 


11 


582 


593 


9 


569 


557 


4 


570 


569 


9 


546 


547 


21 


567 


568 


9 


508 


507 


31 


520 


514 


70 


498 


505 


69 


554 


548 


12 


497 


499 


73 


490 


486 


59 


596 


599 


5 


536 


535 


24 


566 


557 


8 


523 


521 


50 


498 


492 


71 



Alabama 

Alaska 

Arizona 

Arkansas 

California 

Colorado 

Connecticut 

Delaware 

Dist. of Columbia. 

Florida 

Georgia 

Hawaii 

Idaho 

Illinois 

Indiana 

Iowa 

Kansas 

Kentucky 

Louisiana 

Maine 

Maryland 

Massachusetts .... 

Michigan 

Minnesota 

Mississippi 

Missouri 

Montana 

Nebraska 

Nevada 

New Hampshire . . 

New Jersey 

New Mexico 

New York 

North Carolina . . . 
North Dakota .... 

Ohio 

Oklahoma 

Oregon 

Pennsylvania 



{continued) 
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Rhode Island 

South Carolina 

South Dakota 

Tennessee 

Texas 

Utah 

Vermont 

Virginia 

Washington 

West Virginia 

Wisconsin 

Wyoming 

National Average . . . 

Source: The College Board 



1996 




% Graduates 
Taking 


Verbal 


Math 


SAT 


501 


491 


69 


480 


474 


57 


574 


566 


5 


563 


552 


14 


495 


500 


48 


583 


575 


4 


506 


500 


70 


507 


496 


68 


519 


519 


47 


526 


506 


17 


577 


586 


8 


544 


544 


11 


505 


508 


41 



8. Verify Equation 9.3.3, which states that 



VarC4) 



° 2 E -1 



n > . (X; - X 



9. In Problem 4, 

(a) Estimate the variance of an individual response. 

(b) Determine a 90 percent confidence interval for the variance. 

10. Verify that 



SS R = 



S xx Syy - S, 



xY 



<J\v 



11. The following table relates the number of sunspots that appeared each year from 
1970-1983 to the number of auto accident deaths during that year. Test the 
hypothesis that the number of auto deaths is not affected by the number of 
sunspots. (The sunspot data are from Jastrow and Thompson, Fundamentals and 
Frontiers of Astronomy, and the auto death data are from General Statistics of the U. S. 
1985.) 
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Auto Accidents Deaths 


Year 


Sunspots 


(1,000s) 


70 


165 


54.6 


71 


89 


53.3 


72 


55 


56.3 


73 


34 


49.6 


74 


9 


47.1 


75 


30 


45.9 


76 


59 


48.5 


77 


83 


50.1 


78 


109 


52.4 


79 


127 


52.5 


80 


153 


53.2 


81 


112 


51.4 


82 


80 


46 


83 


45 


44.6 



12. The following data set presents the heights of 12 male law school classmates whose 
law school examination scores were roughly equal. It also gives their annual salaries 
5 years after graduation. Each of them went into corporate law. The height is in 
inches and the salary in units of $1,000. 



Height 


Salary 


64 


91 


65 


94 


66 


88 


67 


103 


69 


77 


70 


96 


72 


105 


72 


88 


74 


122 


74 


102 


75 


90 


76 


114 



(a) Do the above data establish the hypothesis that a lawyer's salary is related to 
his height? Use the 5 percent level of significance. 

(b) What was the null hypothesis in part (a)? 
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13. Suppose in the simple linear regression model 

Y = a + fix + e 

thatO < < 1. 

(a) Show that if x < al{\ — p), then 

a 

x <E[Y] < 

1 -P 

(b) Show that if x > a/(l — P), then 

a 
x > E[Y] > 



and conclude that E[Y] is always between x and al{\ — P). 

14. A study has shown that a good model for the relationship between X and Y , the 
first and second year batting averages of a randomly chosen major league baseball 
player, is given by the equation 

Y = .159 + .AX + e 

where f is a normal random variable with mean 0. That is, the model is 
a simple linear regression with a regression toward the mean. 

(a) If a player's batting average is .200 in his first year, what would you predict 
for the second year? 

(b) If a player's batting average is .265 in his first year, what would you predict 
for the second year? 

(c) If a player's batting average is .310 in his first year, what would you predict 
for the second year? 

15. Experienced flight instructors have claimed that praise for an exceptionally fine 
landing is typically followed by a poorer landing on the next attempt, whereas 
criticism of a faulty landing is typically followed by an improved landing. Should 
we thus conclude that verbal praise tends to lower performance levels, whereas 
verbal criticism tends to raise them? Or is some other explanation possible? 

16. Verify Equation 9.4.3. 

17. The following data represent the relationship between the number of alignment 
errors and the number of missing rivets for 10 different aircrafts. 

(a) Plot a scatter diagram. 

(b) Estimate the regression coefficients. 

(c) Test the hypothesis that a = 1 . 
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Number of 


Number of 


Missing Rivets = x 


Alignment Errors = y 


13 


7 


15 


7 


10 


5 


22 


12 


30 


15 


7 


2 


25 


13 


16 


9 


20 


11 


15 


8 



(d) Estimate the expected number of alignment errors of a plane having 24 
missing rivets. 

(e) Compute a 90 percent confidence interval estimate for the quantity in (d). 

18. The following data give the average price of all books reviewed in the journal 
Science in the years from 1990 to 1996. 



Price (Dollars) 


1990 


1991 


1992 1993 1994 


1995 


1996 


54.43 


54.08 


57.58 51.21 59.96 


60.52 


62.13 



Give an interval that, with 95 percent confidence, will contain the average price 
of all books reviewed in Science in 1997. 

Problems 19 through 23 refer to the following data relating cigarette smoking 
and death rates for 4 types of cancers in 14 states. The data are based in part on 
records concerning 1960 cigarette tax receipts. 



Cigarette Smoking and Cancer Death Rates 



State 



Deaths per Year per 100,000 People 



Cigarettes 
per Person 



Bladder 
Cancer 



Lung 
Cancer 



Kidney 
Cancer 



Leukemia 



California 


2,860 


4.46 


22.07 


2.66 


7.06 


Idaho 


2,010 


3.08 


13.58 


2.46 


6.62 


Illinois 


2,791 


4.75 


22.80 


2.95 


7.27 



(continued) 
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Cigarettes 


Deaths 


per Year per 


100,000 


People 




Bladder 


Lung 


Kidney 




State 


per Person 


Cancer 


Cancer 


Cancer 


Leukemia 


Indiana 


2,618 


4.09 


20.30 


2.81 


7.00 


Iowa 


2,212 


4.23 


16.59 


2.90 


7.69 


Kansas 


2,184 


2.91 


16.84 


2.88 


7.42 


Kentucky 


2,344 


2.86 


17.71 


2.13 


6.41 


Massachusetts 


2,692 


4.69 


22.04 


3.03 


6.89 


Minnesota 


2,206 


3.72 


14.20 


3.54 


8.28 


New York 


2,914 


5.30 


25.02 


3.10 


7.23 


Alaska 


3,034 


3.46 


25.88 


4.32 


4.90 


Nevada 


4,240 


6.54 


23.03 


2.85 


6.67 


Utah 


1,400 


3.31 


12.01 


2.20 


6.71 


Texas 


2,257 


3.21 


20.74 


2.69 


7.02 



19. (a) Draw a scatter diagram of cigarette consumption versus death rate from 

bladder cancer. 

(b) Does the diagram indicate the possibility of a linear relationship? 

(c) Find the best linear fit. 

(d) If next year's average cigarette consumption is 2,500, what is your prediction 
of the death rate from bladder cancer? 

20. (a) Draw a scatter diagram relating cigarette use and death rates from lung 

cancer. 

(b) Estimate the regression parameters a and /3. 

(c) Test at the .05 level of significance the hypothesis that cigarette consumption 
does not affect the death rate from lung cancer. 

(d) What is the p- value of the test in part (c)? 

21. (a) Draw a scatter diagram of cigarette use versus death rate from kidney cancer. 

(b) Estimate the regression line. 

(c) What is the/>-value in the test that the slope of the regression line is 0? 

(d) Determine a 90 percent confidence interval for the mean death rate from 
kidney cancer in a state whose citizens smoke an average of 3,400 cigarettes 
per year. 

22. (a) Draw a scatter diagram of cigarettes smoked versus death rate from leukemia. 

(b) Estimate the regression coefficients. 

(c) Test the hypothesis that there is no regression of the death rate from leukemia 
on the number of cigarettes used. That is, test that /3 = 0. 

(d) Determine a 90 percent prediction interval for the leukemia death rate in 
a state whose citizens smoke an average of 2,500 cigarettes. 
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23. (a) Estimate the variances in Problems 19 through 22. 

(b) Determine a 95 percent confidence interval for the variance in the data relating 
to lung cancer. 

(c) Break up the lung cancer data into two parts — the first corresponding to 
states whose average cigarette consumption is less than 2,300, and the second 
greater. Assume a linear regression model for both sets of data. How would 
you test the hypothesis that the variance of a response is the same for both 
sets? 

(d) Do the test in part (c) at the .05 level of significance. 

24. Plot the standardized residuals from the data of Problem 1 . What does the plot 
indicate about the assumptions of the linear regression model? 

25. It is difficult and time consuming to measure directly the amount of protein in 
a liver sample. As a result, medical laboratories often make use of the fact that 
the amount of protein is related to the amount of light that would be absorbed 
by the sample. As a result, a spectrometer that emits light is shined on a solution 
that contains the liver sample and the amount of light absorbed is then used to 
estimate the amount of protein. 

The above procedure was tried on five samples having known amounts of 
protein, with the following data resulting. 



Light Absorbed Amount of Protein (mg) 

.44 2 

.82 16 

1.20 30 

1.61 46 

1.83 55 



(a) Determine the coefficient of determination. 

(b) Does this appear to be a reasonable way of estimating the amount of protein 
in a liver sample? 

(c) What is the estimate of the amount of protein when the light absorbed is 1.5? 

(d) Determine a prediction interval, in which we can have 90 percent confidence, 
for the quantity in part (c). 

26. The determination of the shear strength of spot welds is relatively difficult, whereas 
measuring the weld diameter of spot welds is relatively simple. As a result, it would 
be advantageous if shear strength could be predicted from a measurement of weld 
diameter. The data are as follows: 

(a) Draw a scatter diagram. 

(b) Find the least squares estimates of the regression coefficients. 
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Shear Strength (psi) 



Weld Diameter (.0001 in.) 



370 
780 
1,210 
1,560 
1,980 
2,450 
3,070 
3,550 
3,940 
3,950 



400 
800 
1,250 
1,600 
2,000 
2,500 
3,100 
3,600 
4,000 
4,000 



(c) Test the hypothesis that the slope of the regression line is equal to 1 at the .05 
level significance. 

(d) Estimate the expected value of shear strength when the weld diameter is .2500. 

(e) Find a prediction interval such that, with 95 percent confidence, the value 
of shear strength corresponding to a weld diameter of .2250 inch will be 
contained in it. 

(f ) Plot the standardized residuals. 

(g) Does the plot in part (f) support the assumptions of the model? 

27. A screw manufacturer is interested in giving out data to his customers on the 
relation between nominal and actual lengths. The following results (in inches) 
were observed. 



Nominal x 




Actual y 




l 

4 


.262 


.262 


.245 


1 

2 


.496 


.512 


.490 


3 

4 


.743 


.744 


.751 


1 


.976 


1.010 


1.004 


1* 


1.265 


1.254 


1.252 


1* 


1.498 


1.518 


1.504 


1| 


1.738 


1.759 


1.750 


2 


2.005 


1.992 


1.992 



(a) Estimate the regression coefficients. 

(b) Estimate the variance involved in manufacturing a screw. 

(c) For a large set of nominal 1-inch screws, find a 90 percent confidence interval 
for the average length. 
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(d) For a nominal 1-inch screw, find a 90 percent prediction interval for its actual 
length. 

(e) Plot the standardized residuals. 

(f ) Do the residuals in part (e) indicate any flaw in the regression model? 

(g) Determine the index of fit. 

28. Glass plays a key role in criminal investigations, because criminal activity often 
results in the breakage of windows and other glass objects. Since glass fragments 
often lodge in the clothing of the criminal, it is of great importance to be able 
to identify such fragments as originating at the scene of the crime. Two physical 
properties of glass that are useful for identification purposes are its refractive index, 
which is relatively easy to measure, and its density, which is much more difficult 
to measure. The exact measurement of density is, however, greatly facilitated if 
one has a good estimate of this value before setting up the laboratory experiment 
needed to determine it exactly. Thus, it would be quite useful if one could use the 
refractive index of a glass fragment to estimate its density. 

The following data relate the refractive index to the density for 18 pieces of 
glass. 



Refractive Index 


Density 


Refractive Index 


Density 


1.5139 


2.4801 


1.5161 


2.4843 


1.5153 


2.4819 


1.5165 


2.4858 


1.5155 


2.4791 


1.5178 


2.4950 


1.5155 


2.4796 


1.5181 


2.4922 


1.5156 


2.4773 


1.5191 


2.5035 


1.5157 


2.4811 


1.5227 


2.5086 


1.5158 


2.4765 


1.5227 


2.5117 


1.5159 


2.4781 


1.5232 


2.5146 


1.5160 


2.4909 


1.5253 


2.5187 



(a) Predict the density of a piece of glass with a refractive index 1.52. 

(b) Determine an interval that, with 95 percent confidence, will contain the 
density of the glass in part (a) . 



29. The regression model 



Y=Rx + e, e~N(0,cr 2 ) 



is called regression through the origin since it presupposes that the expected 
response corresponding to the input level x = is equal to 0. Suppose that 
(xi, Yi), i = 1, . . . , n is a data set from this model. 

(a) Determine the least squares estimator B of ft. 

(b) What is the distribution of 5? 
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(c) Define SSr and give its distribution. 

(d) Derive a test of Ho : /3 = /?o versus H\ : /3 7^ /?o- 

(e) Determine a 100(1 — a) percent prediction interval for Y(xq), the response 
at input level xq . 

30. Prove the identity 

S 2 
R 2 = b * Y 



S xx Syy 



31. The weight and systolic blood pressure of randomly selected males in age-group 
25 to 30 are shown in the following table. 



Subject 


Weight 


Systolic BP 


Subject 


Weight 


Systolic BP 


1 


165 


130 


11 


172 


153 


2 


167 


133 


12 


159 


128 


3 


180 


150 


13 


168 


132 


4 


155 


128 


14 


174 


149 


5 


212 


151 


15 


183 


158 


6 


175 


146 


16 


215 


150 


7 


190 


150 


17 


195 


163 


8 


210 


140 


18 


180 


156 


9 


200 


148 


19 


143 


124 


10 


149 


125 


20 


240 


170 



(a) Estimate the regression coefficients. 

(b) Do the data support the claim that systolic blood pressure does not depend 
on an individual's weight? 

(c) If a large number of males weighing 182 pounds have their blood pressures 
taken, determine an interval that, with 95 percent confidence, will contain 
their average blood pressure. 

(d) Analyze the standardized residuals. 

(e) Determine the sample correlation coefficient. 

32. It has been determined that the relation between stress (S) and the number of 
cycles to failure (A0 for a particular type alloy is given by 

A 

S = 

where A and m are unknown constants. An experiment is run yielding the following 
data. 
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Stress 
(thousand psi) 



N 
(million cycles to failure) 



55.0 
50.5 
43.5 
42.5 
42.0 
41.0 
35.7 
34.5 
33.0 
32.0 



.223 
.925 
6.75 
18.1 
29.1 
50.5 
126 
215 
445 
420 



Estimate A and m. 

33. In 1957 the Dutch industrial engineer J. R. Dejong proposed the following model 
for the time it takes to perform a simple manual task as a function of the number 
of times the task has been practiced: 

1 «* ts 

where T is the time, n is the number of times the task has been practiced, and t 
and s are parameters depending on the task and individual. Estimate t and s for 
the following data set. 

T 22.4 21.3 19.7 15.6 15.2 13.9 13.7 
n 1 2 3 4 5 6 

34. The chlorine residual in a swimming pool at various times after being cleaned is 
as given: 





Chlorine Residual 


Time (hr) 


(pt/million) 


2 


1.8 


4 


1.5 


6 


1.45 


8 


1.42 


10 


1.38 


12 


1.36 



-bx 



Fit a curve of the form 

Y zz ae 
What would you predict for the chlorine residual 1 5 hours after a cleaning? 
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35. The proportion of a given heat rise that has dissipated a time t after the source is 
cut off is of the form 



for some unknown constant a. Given the data 

P .07 .21 .32 .38 .40 .45 .51 
t .1 .2 .3 .4 .5 .6 .7 

estimate the value of a. Estimate the value of? at which half of the heat rise is 
dissipated. 

36. The following data represent the bacterial count of five individuals at different 
times after being inoculated by a vaccine consisting of the bacteria. 



Days Since Inoculation Bacterial Count 

3 121,000 

6 134,000 

7 147,000 

8 210,000 

9 330,000 

(a) Fit a curve. 

(b) Estimate the bacteria count of a new patient after 8 days. 

37. The following data yield the amount of hydrogen present (in parts per million) 
in core drillings of fixed size at the following distances (in feet) from the base of 
a vacuum-cast ingot. 



Distance 


1 


2 


3 


4 


5 


6 


7 


8 


9 


10 


Amount 


1.28 


1.50 


1.12 


.94 


.82 


.75 


.60 


.72 


.95 


1.20 



(a) Draw a scatter diagram. 

(b) Fit a curve of the form 



to the data. 



Y = a + fix + yx + e 



38. A new drug was tested on mice to determine its effectiveness in reducing cancerous 
tumors. Tests were run on 10 mice, each having a tumor of size 4 grams, by 
varying the amount of the drug used and then determining the resulting reduction 
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in the weight of the tumor. The data were as follows: 



Coded Amount of Drug Tumor Weight Reduction 

1 .50 

2 .90 

3 1.20 

4 1.35 

5 1.50 

6 1.60 

7 1.53 

8 1.38 

9 1.21 
10 .65 



Estimate the maximum expected tumor reduction and the amount of the drug 
that attains it by fitting a quadratic regression equation of the form 

Y = p + Pix + fcx 2 + e 

39. The following data represent the relation between the number of cans damaged in 
a boxcar shipment of cans and the speed of the boxcar at impact. 



Speed Number of Cans Damaged 

3 54 

3 62 

3 65 

5 94 

5 122 

5 84 

6 142 

7 139 

7 184 

8 254 



(a) Analyze as a simple linear regression model. 

(b) Plot the standardized residuals. 

(c) Do the results of part (b) indicate any flaw in the model? 

(d) If the answer to part (c) is yes, suggest a better model and estimate all resulting 
parameters. 
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40. Redo Problem 5 under the assumption that the variance of the gain in reading 
speed is proportional to the number of weeks in the program. 

41. The following data relate the proportions of coal miners who exhibit symptoms of 
pneumoconiosis to the number of years of working in coal mines. 



Years Working Proportion Having Penumoconiosis 

5 

10 .0090 

15 .0185 

20 .0672 

25 .1542 

30 .1720 

35 .1840 

40 .2105 

45 .3570 

50 .4545 



Estimate the probability that a coal miner who has worked for 42 years will have 
pneumoconiosis. 

42. The following data set refers to Example 9.8c. 



Number of Cars 


Number of Accidents 


(Daily) 


(Monthly) 


2,000 


15 


2,300 


27 


2,500 


20 


2,600 


21 


2,800 


31 


3,000 


16 


3,100 


22 


3,400 


23 


3,700 


40 


3,800 


39 


4,000 


27 


4,600 


43 


4,800 


53 



(a) Estimate the number of accidents in a month when the number of cars using 
the highway is 3,500. 
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(b) Use the model 

*JY = a + fix + e 

and redo part (a). 

43. The peak discharge of a river is an important parameter for many engineering 
design problems. Estimates of this parameter can be obtained by relating it to the 
watershed area (x\) and watershed slope (#2). Estimate the relationship based on 
the following data. 



X\ 

(m 2 ) 


x 2 
(ft/ft) 


Peak 
Discharge 

(ft 3 /sec) 


36 


.005 


50 


37 


.040 


40 


45 


.004 


45 


87 


.002 


110 


450 


.004 


490 


550 


.001 


400 


1,200 


.002 


650 


4,000 


.0005 


1,550 



44. The sediment load in a stream is related to the size of the contributing drainage 
area (xi) and the average stream discharge fe). Estimate this relationship using 
the following data. 



Area 


Discharge 


Sediment Yield 


(xl0 3 mi 2 ) 


(ft 3 /sec) 


(Millions of tons/yr) 


8 


65 


1.8 


19 


625 


6.4 


31 


1,450 


3.3 


16 


2,400 


1.4 


41 


6,700 


10.8 


24 


8,500 


15.0 


3 


1,550 


1.7 


3 


3,500 


.8 


3 


4,300 


.4 


7 


12,100 


1.6 



45. Fit a multiple linear regression equation to the following data set. 
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X\ 


*2 


*3 


Xj 


J 


1 


11 


16 


4 


275 


2 


10 


9 


3 


183 


3 


9 


4 


2 


140 


4 


8 


1 


1 


82 


5 


7 


2 


1 


97 


6 


6 


1 


-1 


122 


7 


5 


4 


-2 


146 


8 


4 


9 


-3 


246 


9 


3 


16 


-4 


359 


10 


2 


25 


-5 


482 



46. The following data refer to Stanford heart transplants. It relates the survival time 
of patients that have received heart transplants to their age when the transplant 
occurred and to a so-called mismatch score that is supposed to be an indicator of 
how well the transplanted heart should fit the recipient. 



Survival Time (in days) 


Mismatch Score 


Age 


624 


1.32 


51.0 


46 


.61 


42.5 


64 


1.89 


54.6 


1,350 


.87 


54.1 


280 


1.12 


49.5 


10 


2.76 


55.3 


1,024 


1.13 


43.4 


39 


1.38 


42.8 


730 


.96 


58.4 


136 


1.62 


52.0 


836 


1.58 


45.0 


60 


.69 


64.5 



(a) Letting the dependent variable be the logarithm of the survival time, fit 
a regression on the independent variables mismatch score and age. 

(b) Estimate the variance of the error term. 

47. (a) Fit a multiple linear regression equation to the following data set. 

(b) Test the hypothesis that /So = 0. 

(c) Test the hypothesis that /% = 0. 

(d) Test the hypothesis that the mean response at the input levels x\ = X2 = 
X3 = 1 is 8.5. 
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x l 


*2 


*3 


J 


7.1 


.68 


4 


41.53 


9.9 


.64 


1 


63.75 


3.6 


.58 


1 


16.38 


9.3 


.21 


3 


45.54 


2.3 


.89 


5 


15.52 


4.6 


.00 


8 


28.55 


.2 


.37 


5 


5.65 


5.4 


.11 


3 


25.02 


8.2 


.87 


4 


52.49 


7.1 


.00 


6 


38.05 


4.7 


.76 





30.76 


5.4 


.87 


8 


39.69 


1.7 


.52 


1 


17.59 


1.9 


.31 


3 


13.22 


9.2 


.19 


5 


50.98 



48. The tensile strength of a certain synthetic fiber is thought to be related to x\ , the 
percentage of cotton in the fiber, and x%, the drying time of the fiber. A test of 10 
pieces of fiber produced under different conditions yielded the following results. 



Y= Tensile 
Strength 


xi = Percentage 
of Cotton 


x 2 = Drying 
Time 


213 


13 


2.1 


220 


15 


2.3 


216 


14 


2.2 


225 


18 


2.5 


235 


19 


3.2 


218 


20 


2.4 


239 


22 


3.4 


243 


17 


4.1 


233 


16 


4.0 


240 


18 


4.3 



(a) Fit a multiple regression equation. 

(b) Determine a 90 percent confidence interval for the mean tensile strength of 
a synthetic fiber having 21 percent cotton whose drying time is 3.6. 

49. The time to failure of a machine component is related to the operating volt- 
age (x\), the motor speed in revolutions per minute (x2>, and the operating 
temperature (#3). 
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A designed experiment is run in the research and development laboratory, and the 
following data, where y is the time to failure in minutes, are obtained. 



y 


X\ 


x 2 


*3 


2,145 


110 


750 


140 


2,155 


110 


850 


180 


2,220 


110 


1,000 


140 


2,225 


110 


1,100 


180 


2,260 


120 


750 


140 


2,266 


120 


850 


180 


2,334 


120 


1,000 


140 


2,340 


130 


1,000 


180 


2,212 


115 


840 


150 


2,180 


115 


880 


150 



(a) Fit a multiple regression model to these data. 

(b) Estimate the error variance. 

(c) Determine a 95 percent confidence interval for the mean time to failure 
when the operating voltage is 125, the motor speed is 900, and the operating 
temperature is 160. 

50. Explain why, for the same data, a prediction interval for a future response always 
contains the corresponding confidence interval for the mean response. 

51. Consider the following data set. 



X\ 

5.1 


XI 

2 


y 

55.42 


5.4 


8 


100.21 


5.9 


-2 


27.07 


6.6 


12 


169.95 


7.5 


-6 


-17.93 


8.6 


16 


197.77 


9.9 


-10 


-25.66 


11.4 


20 


264.18 


13.1 


-14 


-53.88 


15 


24 


317.84 


17.1 


-18 


-72.53 


19.4 


28 


385.53 



(a) Fit a linear relationship between y and x\,X2- 

(b) Determine the variance of the error term. 
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(c) Determine an interval that, with 95 percent confidence, will contain the 
response when the inputs are x\ = 10. 2 and X2 = 17. 

52. The cost of producing power per kilowatt hour is a function of the load factor 
and the cost of coal in cents per million Btu. The following data were obtained 
from 12 mills. 



Load Factor 


Cost of 


Power 


(in percent) 


Coal 


Cost 


84 


14 


4.1 


81 


16 


4.4 


73 


22 


5.6 


74 


24 


5.1 


67 


20 


5.0 


87 


29 


5.3 


77 


26 


5.4 


76 


15 


4.8 


69 


29 


6.1 


82 


24 


5.5 


90 


25 


4.7 


88 


13 


3.9 



(a) Estimate the relationship. 

(b) Test the hypothesis that the coefficient of the load factor is equal to 0. 

(c) Determine a 95 percent prediction interval for the power cost when the load 
factor is 85 and the coal cost is 20. 

53. The following data relate the systolic blood pressure to the age (x\) and weight 
(X2) of a set of individuals of similar body type and lifestyle. 



Age Weight Blood Pressure 



25 


162 


112 


25 


184 


144 


42 


166 


138 


55 


150 


145 


30 


192 


152 


40 


155 


110 


66 


184 


118 


60 


202 


160 


38 


174 


108 



(a) Test the hypothesis that, when an individual's weight is known, age gives 
no additional information in predicting blood pressure. 
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(b) Determine an interval that, with 95 percent confidence, will contain the 
average blood pressure of all individuals of the preceding type that are 
45 years old and weigh 180 pounds. 

(c) Determine an interval that, with 95 percent confidence, will contain the 
blood pressure of a given individual of the preceding type who is 45 years old 
and weighs 180 pounds. 

54. A recently completed study attempted to relate job satisfaction to income (in 
1,000s) and seniority for a random sample of 9 municipal workers. The job sat- 
isfaction value given for each worker is his or her own assessment of such, with a 
score of 1 being the lowest and 10 being the highest. The following data resulted. 



Yearly Income 


Years on the Job 


Job Satisfaction 


27 


8 


5.6 


22 


4 


6.3 


34 


12 


6.8 


28 


9 


6.7 


36 


16 


7.0 


39 


14 


7.7 


33 


10 


7.0 


42 


15 


8.0 


46 


22 


7.8 



(a) Estimate the regression parameters. 

(b) What qualitative conclusions can you draw about how job satisfaction changes 
when income remains fixed and the number of years of service increases? 

(c) Predict the job satisfaction of an employee who has spent 5 years on the job 
and earns a yearly salary of $31,000. 

55. Suppose in Problem 54 that job satisfaction was related solely to years on the job, 
with the following data resulting. 



Years on the Job Job Satisfaction 



8 


5.6 


4 


6.3 


12 


6.8 


9 


6.7 


16 


7.0 


14 


7.7 


10 


7.0 


15 


8.0 


22 


7.8 
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(a) Estimate the regression parameters a and fi. 

(b) What is the qualitative relationship between years of service and job 
satisfaction? That is, what appears to happen to job satisfaction as service 
increases? 

(c) Compare your answer to part (b) with the answer you obtained in part (b) of 
Problem 54. 

(d) What conclusion, if any, can you draw from your answer in part (c)? 

56. For the logistics regression model, find the value x such that/>(x) = .5 

57. A study of 64 prematurely born infants was interested in the relation between 
the gestational age (in weeks) of the infant at birth and whether the infant was 
breast-feeding at the time of release from the birthing hospital. The following data 
resulted: 



28 


6 


29 


5 


30 


9 


31 


9 


32 


20 


33 


15 



Gestational Age Frequency Number Breast-Feeding 

2 
2 
7 
7 
16 
14 



In the preceding, the frequency column refers to the number of babies born after 
the specified gestational number of weeks. 

(a) Explain how the relationship between gestational age and whether the infant 
was breast-feeding can be analyzed via a logistics regression model. 

(b) Use appropriate software to estimate the parameters for this model. 

(c) Estimate the probability that a newborn with a gestational age of 29 weeks 
will be breast-feeding. 

58. Twelve first-time heart attack victims were given a test that measures internal 
anger. The following data relates their scores and whether they had a second heart 
attack within 5 years. 



Anger Score 


Second Heart Attack 


80 


yes 


77 


yes 


70 


no 


68 


yes 


64 


no 




{continued) 
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Anger Score Second Heart Attack 

60 yes 

50 yes 

46 no 

40 yes 

35 no 

30 no 

25 yes 



(a) Explain how the relationship between a second heart attack and one's anger 
score can be analyzed via a logistics regression model. 

(b) Use appropriate software to estimate the parameters for this model. 

(c) Estimate the probability that a heart attack victim with an anger score of 55 
will have a second attack within 5 years. 
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ANALYSIS OF VARIANCE 



I O.I INTRODUCTION 

A large company is considering purchasing, in quantity, one of four different computer 
packages designed to teach a new programming language. Some influential people within 
this company have claimed that these packages are basically interchangeable in that the one 
chosen will have little effect on the final competence of its user. To test this hypothesis the 
company has decided to choose 160 of its engineers, and divide them into 4 groups of size 
40. Each member in group i will then be given teaching package i, i = 1, 2, 3, 4, to learn 
the new language. When all the engineers complete their study, a comprehensive exam 
will be given. The company then wants to use the results of this examination to determine 
whether the computer teaching packages are really interchangeable or not. How can they 
do this? 

Before answering this question, let us note that we clearly desire to be able to conclude 
that the teaching packages are indeed interchangeable when the average test scores in all 
the groups are similar and to conclude that the packages are essentially different when 
there is a large variation among these average test scores. However, to be able to reach 
such a conclusion, we should note that the method of division of the 160 engineers 
into 4 groups is of vital importance. For example, suppose that the members of the first 
group score significantly higher than those of the other groups. What can we conclude 
from this? Specifically, is this result due to teaching package 1 being a superior teaching 
package, or is it due to the fact that the engineers in group 1 are just better learners? To be 
able to conclude the former, it is essential that we divide the 160 engineers into the 4 groups 
in such a way so as to make it extremely unlikely that one of these groups is inherently 
superior. The time-tested method for doing this is to divide the engineers into 4 groups 
in a completely random fashion. That is, we should do it in such a way so that all possible 
divisions are equally likely; for in this case, it would be very unlikely that any one group 
would be significantly superior to any other group. So let us suppose that the division of the 
engineers was indeed done "at random." (Whereas it is not at all obvious how this can be 
accomplished, one efficient procedure is to start by arbitrarily numbering the 160 engineers. 
Then generate a random permutation of the integers 1,2,..., 160 and put the engineers 

439 
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whose numbers are among the first 40 of the permutation into group 1, those whose num- 
bers are among the 4 1st through the 80th of the permutation into group 2, and so on.) 

It is now probably reasonable to suppose that the test score of a given individual 
should be approximately a normal random variable having parameters that depend on 
the package from which he was taught. Also, it is probably reasonable to suppose that 
whereas the average test score of an engineer will depend on the teaching package she 
was exposed to, the variability in the test score will result from the inherent varia- 
tion of 160 different people and not from the particular package used. Thus, if we let 
Xij, i = 1, . . . , 4,j = 1, . . . , 40, denote the test score of the jth engineer in group i, 
a reasonable model might be to suppose that the Xy are independent random variables 
with Xij having a normal distribution with unknown mean /Lt z and unknown variance a . 
The hypothesis that the teaching packages are interchangeable is then equivalent to the 
hypothesis that fi\ = [12 = M3 = AM- 

In this chapter, we present a technique that can be used to test such a hypothesis. This 
technique, which is rather general and can be used to make inferences about a multitude 
of parameters relating to population means, is known as the analysis of variance. 

10.2 AN OVERVIEW 

Whereas hypothesis tests concerning two population means were studied in Chapter 8, 
tests concerning multiple population means will be considered in the present chapter. In 
Section 10.3, we suppose that we have been provided samples of size n from m distinct 
populations and that we want to use these data to test the hypothesis that the m population 
means are equal. Since the mean of a random variable depends only on a single factor, 
namely, the sample the variable is from, this scenario is said to constitute a one-way 
analysis of variance. A procedure for testing the hypothesis is presented. In addition, in 
Section 10.3.1 we show how to obtain multiple comparisons of the (™) differences between 
the pairs of population means; and in Section 10.3.2 we show how the equal means 
hypothesis can be tested when the m sample sizes are not all equal. 

In Sections 10.4 and 10.5, we consider models that assume that there are two factors 
that determine the mean value of a variable. In these models, the variables can be thought 
of as being arranged in a rectangular array, with the mean value of a specified variable 
depending both on the row and on the column in which it is located. Such a model is 
called a two-way analysis of variance. In these sections we suppose that the mean value of a 
variable depends on its row and column in an additive fashion; specifically, that the mean 
of the variable in row i, column j can be written as fi + a; + f};. In Section 10.4, we 
show how to estimate these parameters, and in Section 10.5 how to test hypotheses to the 
effect that a given factor — either the row or the column in which a variable is located — 
does not affect the mean. In Section 10.6, we consider the situation where the mean of 
a variable is allowed to depend on its row and column in a nonlinear fashion, thus allowing 
for a possible interaction between the two factors. We show how to test the hypothesis that 
there is no interaction, as well as ones concerning the lack of a row effect and the lack of 
a column effect on the mean value of a variable. 
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In all of the models considered in this chapter, we assume that the data are normally 
distributed with the same (although unknown) variance a . The analysis of variance 
approach for testing a null hypothesis Hq concerning multiple parameters relating to the 
population means is based on deriving two estimators of the common variance a . The 
first estimator is a valid estimator of a whether the null hypothesis is true or not, while 
the second one is a valid estimator only when Hq is true. In addition, when Hq is not true 
this latter estimator will tend to exceed a . The test will be to compare the values of these 
two estimators, and to reject Hq when the ratio of the second estimator to the first one is 
sufficiently large. In other words, since the two estimators should be close to each other 
when Hq is true (because they both estimate a 2 in this case) whereas the second estimator 
should tend to be larger than the first when Hq is not true, it is natural to reject Hq when 
the second estimator is significantly larger than the first. 

We will obtain estimators of the variance O by making use of certain facts concerning 
chi-square random variables, which we now present. Suppose that X\, . . . ,Xn are inde- 
pendent normal random variables having possibly different means but a common variance 
a - , and let //; = E[Xi\, i = 1, . . . ,N. Since the variables 

Zi = (Xi - l^i)lcr, i = l,...,N 

have standard normal distributions, it follows from the definition of a chi-square random 
variable that 

N N 

j2z? = Y, {x >-^ )2/cf2 (10 - 2 - 1) 

is a chi-square random variable with N degrees of freedom. Now, suppose that each of the 
values fit, i = l,...,N, can be expressed as a linear function of a fixed set of k unknown 
parameters. Suppose, further, that we can determine estimators of these k parameters, 
which thus gives us estimators of the mean values /x;. If we let fii denote the resulting 
estimator of /x z , i = 1, . . . , N, then it can be shown that the quantity 



N 



E« 



£«) 2 /^ 2 



will have a chi-square distribution with N — k degrees of freedom. 
In other words, we start with 



jV 



YjXi - E[Xi\) 2 la 7 



i=\ 



which is a chi-square random variable with N degrees of freedom. If we now write each 
E[Xj] as a linear function of k parameters and then replace each of these parameters by its 
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estimator, then the resulting expression remains chi-square but with a degree of freedom 
that is reduced by 1 for each parameter that is replaced by its estimator. 

For an illustration of the preceding, consider the case where all the means are known 
to be equal; that is, 

E\M=n, i=\,...,N 

Thus k — 1, because there is only one parameter that needs to be estimated. Substituting 
X, the estimator of the common mean /x, for /x ; - in Equation 10.2. 1, results in the quantity 



N 



Era 



Xfla 1 (10.2.2) 



and the conclusion is that this quantity is a chi-square random variable with TV — 1 
degrees of freedom. But in this case where all the means are equal, it follows that the 
dataXi, . . . ,Xpj constitute a sample from a normal population, and thus Equation 10.2 is 
equal to {N — 1)5 la , where S is the sample variance. In other words, the conclusion 
in this case is just the well-known result (see Section 6.5.2) that (TV — \)S la is a 
chi-square random variable with N — 1 degrees of freedom. 

10.3 ONE-WAY ANALYSIS OF VARIANCE 

Consider m independent samples, each of size n, where the members of the 2 th sample — 
Xi\,Xj2, . . . ,Xi„ — are normal random variables with unknown mean /Lt; and unknown 
variance a . That is, 

Xij ~ N{n;, a 2 ), i=\,...,m, j=\,...,n 

We will be interested in testing 

Hq : fi\ = fix = • • • = [x m 



H\ : not all the means are equal 

That is, we will be testing the null hypothesis that all the population means are equal 
against the alternative that at least two of them differ. One way of thinking about this is 
to imagine that we have m different treatments, where the result of applying treatment 
i on an item is a normal random variable with mean \Xi and variance aj. We are then 
interested in testing the hypothesis that all treatments have the same effect, by applying 
each treatment to a (different) sample of n items and then analyzing the result. 
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Since there are a total of nm independent normal random variables Xy, it follows that 
the sum of the squares of their standardized versions will be a chi-square random variable 
with nm degrees of freedom. That is, 

m n m n 

J2 Y& - E ^) 2 i° 2 = E E***- - ^ )2/or2 ~ x™ (10 - 3 - !) 

To obtain estimators for the m unknown parameters [i\, . . . , fi m , let Xi. denote the 
average of all the elements in sample i; that is, 



n 
7=1 



Xi = Y, x 'J lr 



The variable Xi. is the sample mean of the ith population, and as such is the estimator of 
the population mean fij, for i = 1, . . . , m. Hence, if in Equation 10.3.1 we substitute 
the estimators X;. for the means /A;, for i = 1, . . . , m, then the resulting variable 

m n 

j2J2^ x 'j- x ^ 2/a2 ( 10 - 3 - 2 ) 

will have a chi-square distribution with nm — m degrees of freedom. (Recall that 1 degree 
of freedom is lost for each parameter that is estimated.) Let 

m n 

ss w = y.Y. {x 'j - Xi)2 

and so the variable in Equation 10.4 is SSwIo ■ Because the expected value of a chi- 
square random variable is equal to its number of degrees of freedom, it follows upon 
taking the expectation of the variable in 10.4 that 

E[SSwV& = nm — m 

or, equivalently, 

E[SSw I {nm — m)] = a 

We thus have our first estimator of a 1 , namely, SSwl{nm — m). Also, note that this 
estimator was obtained without assuming anything about the truth or falsity of the null 
hypothesis. 
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Definition 

The statistic 

m n 
i=\ 7=1 

is called the within samples sum of squares because it is obtained by substituting the sample 
population means for the population means in expression 10.3. The statistic 

SSwKnm — m) 

is an estimator of a 2 . 

Our second estimator of a will only be a valid estimator when the null hypothesis is 
true. So let us assume that Hq is true and so all the population means fij are equal, say, 
fii = [x for all i. Under this condition it follows that the m sample meansXi . ,Xi. , . . . , X m . 
will all be normally distributed with the same mean \i and the same variance a In. Hence, 
the sum of squares of the m standardized variables 

- ± i== = \fn{Xi. -/x)/ct 
W<y 2 /n 

will be a chi-square random variable with m degrees of freedom. That is, when Hq is true, 

m 

nY<<X i ,-nJ l lo 2 ~xl (10.3.3) 



Now, when all the population means are equal to /Lt, then the estimator of \jl is the average 
of all the nm data values. That is, the estimator of jx is X.., given by 



nm m 

If we now substitute X.. for the unknown parameter fi in expression 10.5, it follows, 
when Hq is true, that the resulting quantity 



Ys&i.-X.flo 7 



n 
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will be a chi-square random variable with m — \ degrees of freedom. That is, if we define 
SSb by 



SS t = nJ2(Xi. -X.) 2 



then it follows that 

when Ho is true, 

SS(,/cr is chi-square with m — 1 degrees of freedom 

From the above we obtain that when Hq is true, 

E[SSb\la 2 = m-\ 

or, equivalently, 

E[SS b l{m - 1)] = a 2 (10.3.4) 

So, when Hq is true, SSyl(m — 1) is also an estimator of a . 

Definition 

The statistic 



SS b = nJ2&i. ~ X -) 2 



i=\ 

is called the between samples sum of squares. When Hq is true, SSyl{m — 1) is an estimator 
ofa 2 . 

Thus we have shown that 

SSwl(nm — m) always estimates a 
SSf,l(m—\) estimates a when Hq is true 

Because* it can be shown that SSyl(m — 1) will tend to exceed a when Hq is not true, it 
is reasonable to let the test statistic be given by 

SS h l{m - 1) 



SSwl{nm — m) 
and to reject Hq when TS is sufficiently large. 



* A proof is given at the end of this section. 
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TABLE 1 0.1 Values of Ty^.05 



s = Degrees of 
Freedom for the 




r = Degrees of Freedom 
for the Numerator 




Denominator 


1 


2 


3 


4 


4 

5 
10 


7.71 
6.61 
4.96 


6.94 

5.79 
4.10 


6.59 
5.41 
3.71 


6.39 
5.19 
3.48 



To determine how large TS needs to be to justify rejecting Hq, we use the fact that 
it can be shown that if Hq is true then SSb and SSw are independent. It follows from 
this that, when Ho is true, TS has an ^-distribution with m — 1 numerator and nm — m 
denominator degrees of freedom. Let F m -\, nrn - m ,a denote the 100(1 — a) percentile of 
this distribution — that is, 



■* \"m—\,nm—m ^ "l 



m— \,nm—m,a j 



where we are using the notation F r>s to represent an ^-random variable with r numerator 
and s denominator degrees of freedom. 

The significance level a test of Hq is as follows: 

SSbKm - 1) 

reject Mq It „ > ^m—\,nm—m,a 

SSwl\nm — m) 
do not reject Hq otherwise 

A table of values of iv^.os for various values of r and s is presented in Table A4 of the 
Appendix. Part of this table is presented in Table 10.1. For instance, from Table 10.1 we 
see that there is a 5 percent chance that an ^-random variable having 3 numerator and 10 
denominator degrees of freedom will exceed 3.71. 

Another way of doing the computations for the hypothesis test that all the population 
means are equal is by computing the />-value. If the value of the test statistic is TS = v, 
then the /<-value will be given by 

/>-value = P{F m _ Unm _ m > v\ 

Program 10.3 will compute the value of the test statistic TS and the resulting /(-value. 

EXAMPLE 10.3a An auto rental firm is using 15 identical motors that are adjusted to run 
at a fixed speed to test 3 different brands of gasoline. Each brand of gasoline is assigned to 
exactly 5 of the motors. Each motor runs on 10 gallons of gasoline until it is out of fuel. 
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p-Values in a One-way ANOVA 





Sample 1 


Sample 2 


Sample 3 




1 


220 


244 


252 


2 


251 


235 


272 


3 


226 


232 


250 


4 


246 


242 


238 


5 


260 


225 













Start 



Quit 



M*(N-1) 



165.9667 



ss B 

M-1 



431.6667 



The value of the f-statistic is 2.6009 
The p-value is 0.1124 



FIGURE I O.I 



The following represents the total mileages obtained by the different motors: 



Gas 1 
Gas 2 
Gas 3 



220 251 226 246 260 

244 235 232 242 225 
252 272 250 238 256 



Test the hypothesis that the average mileage obtained is not affected by the type of gas 
used. Use the 5 percent level of significance. 

SOLUTION We run Program 10.3 to obtain the results shown in Figure 10.1. Since the 
/>-value is greater than .05, the null hypothesis that the mean mileage is the same for all 3 
brands of gasoline cannot be rejected. ■ 

The following algebraic identity, called the sum of squares identity, is useful when 
doing the computations by hand. 

The Sum of Squares Identity 



J2 J2 x i = nmXl - + SSb + SSw 
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When computing by hand, the quantity SS/, defined by 

m 

SS b = n Y,(Xi, - X.) 2 

2=1 

should be computed first. Once SSy has been computed, SSw can be determined from the 
sum of squares identity. That is, E2=i E/Li^y and X should also be computed and 
then SSw determined from 

m n 

ss w = J2J2 x l- nmXl . - SS b 

2=1 j=\ 

EXAMPLE 1 0.3b Let us do the computations of Example 10.3a by hand. The first thing to 
note is that subtracting a constant from each data value will not affect the value of the test 
statistic. So we subtract 220 from each data value to get the following information. 



Gas 




Mileage 




Ey* 




1 





31 6 26 


40 


103 


3,273 


2 


24 


15 12 22 


5 


78 


1,454 


3 


32 


52 30 18 


36 


168 


6,248 



Now m = 3 and n = 5 and 

Xi, = 103/5 = 20.6 
X 2 . = 78/5 = 15.6 
X 3 , = 168/5 = 33.6 

X„ = (103 + 78 + 168)/15 = 23.2667, X 1 = 541.3393 
Thus, 

SS h = 5[(20.6 - 23.2667) 2 + (15.6 - 23.2667) 2 + (33.6 - 23.2667) 2 ] = 863.3335 

Also, 

J2 J2 X l = 3 ' 273 + ! ' 454 + 6 ' 248 = 10 ' 975 
and, from the sum of squares identity, 

SS W = 10,975 - 15(541.3393) - 863.3335 = 1991.5785 



1 0.3 One-Way Analysis of Variance 



449 



The value of the test statistic is thus 

863.3335/2 

TS = = 2.60 

1991.5785/12 

Now, from Table A4 in the Appendix, we see that i*2,i2,.05 = 3.89. Hence, because 
the value of the test statistic does not exceed 3.89, we cannot, at the 5 percent level of 
significance, reject the null hypothesis that the gasolines give equal mileage. ■ 

Let us now show that 

E[SS b l{m - 1)] > a 2 
with equality only when Hq is true. So, we must show that 



Yj&i. -X.) 2 l{m-\) 



;'=1 



> a 2 ln 



with equality only when Hq is true. To verify this, let /x = 5Z*=l l*>il m De the average 
of the means. Also, for i = 1, . . . , m, let 

Yi = Xi - [ii + H, 

Because Xi, is normal with mean /z; and variance a In, it follows that Yj is normal 
with mean /x. and variance o In. Consequently, Y\, . . . , Y m constitutes a sample from 
a normal population having variance a In. Let 



Y =Y = J2 Y i lm =X„- n. + /x. = X, 



be the average of these variables. Now, 

X. -X, = Yi + Hi-n.-Y. 
Consequently, 



£>.-X) 2 



-E 



= E 



-E 



Y^iYi-Y + m-n) 

m 

Yy^i - Y ? + (^ - M-) 2 + 2(W - *Wi - Y) 



i=\ 



5> - Y ? 



/'=1 



+ YSM - M.) 2 +2 J>* - n.)E[Yi - Y] 



i=\ 



7=1 
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m 



= (m- \)a 2 ln + Y&i ~ /-O 2 + 2 J^^i ~ li)E\Yi - Y] 

i=l i=l 

m 

= (m — \)a In + /(M; — M.) 
«'=1 

where the next to last equality follows because the sample variance 5Zi=i(-^' — Y) 2 l{m— 1) 
is an unbiased estimator of its population variance <7 /» and the final equality because 
E[Yj\ — E[Y] = [i — jjl =0. Dividing by m — 1 gives that 



X>, -XJ 2 /(m-l) 



.»'=! 



= CT 2 /« + J](/Z, - /X.) 2 /(W - 1) 



;=i 



m— \ 






nm — m 








TS = 


SS\y/{nm—m) 



and the result follows because XXaO^* — M.) — 0> with equality only when all the /// 
are equal. 

Table 10.2 sums up the results of this section. 

TABLE 10.2 One-Way ANOVA Table 

Source of Degrees of Value of Test 

Variation Sum of Squares Freedom Statistic 

Between samples SS b = n £? l VQ. ~ X - ) 2 

Within samples SS W = £f =1 EjLlt^ " X ') 2 

Significance level a test: 
reject H if TS > F m -\,„ m - m , a 
do not reject otherwise 
If 7^ = v, then Rvalue = P{F m -\ >nm - m > v) 



10.3.1 Multiple Comparisons of Sample Means 

When the null hypothesis of equal means is rejected, we are often interested in a comparison 
of the different sample means fX\, . . . , \x m . One procedure that is often used for this 
purpose is known as the F-method. For a specified value of a, this procedure gives joint 
confidence intervals for all the \^) differences pn — fij, i j^ j, i,j — 1, ... , m, such that 
with probability 1 — a all of the confidence intervals will contain their respective quantities 
fi{ — \Xj. The 7*-method is based on the following result: 

With probability I — a, for every i ^ j 
X L - Xj. - W < in - fij < X L - Xj, + w 
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where 

W = —j= C(m, nm — m, a)y/SSwl{nm — m) 

and where the values of C(m, nm — m, a) are given, for a = .05 and a = .01, in Table 
A5 of the Appendix. 

EXAMPLE 10.3c A college administrator claims that there is no difference in first-year 
grade point averages for students entering the college from any of three different city 
high schools. The following data give the first-year grade point averages of 12 randomly 
chosen students, 4 from each of the three high schools. At the 5 percent level of signifi- 
cance, do these data disprove the claim of the administrator? If so, determine confidence 
intervals for the difference in means of students from the different high schools, such that 
we can be 95 percent confident that all of the interval statements are valid. 



School 1 


School 2 


School 3 


3.2 


3.4 


2.8 


3.4 


3.0 


2.6 


3.3 


3.7 


3.0 


3.5 


3.3 


2.7 



SOLUTION To begin, note that there are m = 3 samples, each of size n = 4. Program 10.3 
on the text disk yields the results: 

5^/9 = . 0431 
/(-value = .0046 

so the hypothesis of equal mean scores for students from the three schools is rejected. 

To determine the confidence intervals for the differences in the population means, note 
first that the sample means are 

X L = 3.350, X 2 . = 3.350, X 3 , = 2.775 

From Table A5 of the Appendix, we see that C(3, 9, .05) = 3.95; thus, as W = 
-7j3.95v. 0431 = .410, we obtain the following confidence intervals. 

—.410 < fjb\ — iii < .410 
. 165 < Mi — /X3 < .985 
. 165 < /X2 — M3 < -985 
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Hence, with 95 percent confidence, we can conclude that the mean grade point average 
of first-year students from high school 3 is less than the mean average of students from 
high school 1 or from high school 2 by an amount that is between .165 and .985, and 
that the difference in grade point averages of students from high schools 1 and 2 is less 
than .410. ■ 

10.3.2 One-Way Analysis of Variance with Unequal Sample Sizes 

The model in the previous section supposed that there were an equal number of data points 
in each sample. Whereas this is certainly a desirable situation (see the Remark at the end 
of this section), it is not always possible to attain. So let us now suppose that we have m 
normal samples of respective sizes n\, n%, . . . , n m . That is, the data consist of the ^Ji=\ w i 
independent random variables Xy,j = 1, . . . , »;, i = 1, . . . , m, where 

Xij ~ M(fi n a 2 ) 

Again we are interested in testing the hypothesis Hq that all means are equal. 
To derive a test of Hq, we start with the fact that 

m n t m ni 

J2 J^iXij - EiX^fla 1 = J2 J^iXij - fiifla 2 
i=\ 7=1 j=l j=\ 

is a chi-square random variable with £^-_] »; degrees of freedom. Hence, upon replacing 
each mean jXi by its estimator Xj , the average of the elements in sample i, we obtain 

m nt 
i=\ 7=1 

which is chi-square with 5Zi=i n i ~ m degrees of freedom. Therefore, letting 



i=\ 7=1 

it follows that SSwl{ YULi n i ~ m ) ls an unbiased estimator of a . 

Furthermore, if Hq is true and fi is the common mean, then the random variables 
Xi. , i = 1, . . . , m will be independent normal random variables with 



E[X.] = ix, Var(X,) = a 2 /n, 



As a result, when Hq is true 



/ — ' a 2 - m *-~ ' 

i=\ ' i=\ 
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is chi-square with m degrees of freedom; therefore, replacing fi in the preceding by its 
estimator X„, the average of all the Xg, results in the statistic 



Y.miXi.-xfla 2 

which is chi-square with m — \ degrees of freedom. Thus, letting 

m 

SS b = Y j n 1 {X. - X.) 2 
i=\ 

it follows, when Hq is true, that SSj,l{m — 1) is also an unbiased estimator of a . Because it 
can be shown that when Hq is true the quantities SSf, and SSw are independent, it follows 
under this condition that the statistic 

SSyl{m — 1) 



(m \ 

is an ^-random variable with m — 1 numerator and 5Z;=i n i ~ m denominator degrees of 
freedom. From this we can conclude that a significance level a test of the null hypothesis 

Hq : jx\ = ■ ■ ■ = ix m 



is to 



SSyl(m - 1) 
reject Hq it 



(m 

not reject Hq otherwise 



"Y > Fm-\,N,ct I N = y^»i - m J 
i J \ / 



REMARK 

When the samples are of different sizes we say that we are in the unbalanced case. Whenever 
possible it is advantageous to choose a balanced design over an unbalanced one. For one 
thing, the test statistic in a balanced design is relatively insensitive to slight departures from 
the assumption of equal population variances. (That is, the balanced design is more robust 
than the unbalanced one.) 
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10.4 TWO-FACTOR ANALYSIS OF VARIANCE: 

INTRODUCTION AND PARAMETER ESTIMATION 

Whereas the model of Section 10.3 enabled us to study the effect of a single factor on a 
data set, we can also study the effects of several factors. In this section, we suppose that 
each data value is affected by two factors. 

EXAMPLE 10.4a Four different standardized reading achievement tests were administered 
to each of 5 students, with the scores shown in the table resulting. Each value in this set 
of 20 data points is affected by two factors, namely, the exam and the student whose score 
on that exam is being recorded. The exam factor has 4 possible values, or levels, and the 
student factor has 5 possible levels. 

In general, let us suppose that there are m possible levels of the first factor and n possible 
levels of the second. Let Xy denote the value obtained when the first factor is at level i 
and the 









Student 






Exam 


1 


2 


3 


4 


5 


1 


75 


73 


60 


70 


86 


2 


78 


71 


64 


72 


90 


3 


80 


69 


62 


70 


85 


4 


73 


67 


63 


80 


92 



second factor is at level j. We will often portray the data set in the following array of rows 
and columns. 



X u 


Xu 


. . Xy 


■ ■ X\ n 


X 2 \ 


X%2 


■ ■ X 2 j 


■ ■ x 2 „ 


Xu 


X,2 ■ 


■ ■ Xij 


■ ■ Xi„ 


X m \ 


X m 2 ■ 


■ ■ -A-mj 


Y 



Because of this we will refer to the first factor as the "row" factor, and the second factor as 
the "column" factor. 

As in Section 10.3, we will suppose that the data Xij, i = 1, . . . , mj = 1, . . . , n are 
independent normal random variables with a common variance a . However, whereas 
in Section 10.3 we supposed that only a single factor affected the mean value of a data 
point — namely, the sample to which it belongs — we will suppose in the present section 
that the mean value of data depends in an additive manner on both its row and its column. 

If, in the model of Section 10.3, we let Xy represent the value of they'th member of 
sample i, then that model could be symbolically represented as 

E\X tj \ = ii i 
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However, if we let /x denote the average value of the //.; — that is, 

m 

(j, = 22 m*/w 

then we can rewrite the model as 

E[Xy] = ix + ai 

where a, = /u.y — fx. With this definition of a; as the deviation of fXj from the average 
mean value, it is easy to see that 

m 

J2 <*> ■ = o 

A two-factor additive model can also be expressed in terms of row and column deviations. 
If we let ii jj = E[Xij\, then the additive model supposes that for some constants ai, i = 
1, . . . , m and bj,j = 1, . . . , n 

iijj = m + bj 

Continuing our use of the "dot" (or averaging) notation, we let 

n m m n 

lii, = 2_^ liijln, ii j = 2J l^ijlm, ix„ = 2J 22 ^'jl nm 

7=1 »=1 »=1 7=1 



Also, we let 
Note that 

Similarly, 
If we now set 



a. = 2, Ailm, b. = 2, bjln 

i=\ 7=1 

n 

l±i. = /S a ' ~^~ bj)ln = ai + b. 

7=1 

IX j = a. + bj, ix,. = a. + b. 



H = n„= a, + b. 

oil = fXj — ix = ai — a 
[5j = fXj — ii = bj — b. 

then the model can be written as 

IXfj = E[Xij\ = ix + ot z - + Pj 
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where 

m n 

i=\ j=\ 

The value fi is called the grand mean, ai is the deviation from the grand mean due to row i, 
and fij is the deviation from the grand mean due to column j. 

Let us now determine estimators of the parameters fi, ai, jij,i = 1, . . . , m,j = 1, . . . , n. 
To do so, continuing our use of "dot" notation, we let 

n 

Xj = y^Xjj/n = average of the values in row i 
7=1 

m 

Xj = y^Xy/m = average of the values in column^' 
i=\ 

m n 

X„ = y^ y^Xjj/nm = average of all data values 



Now, 



=1 /=i 



E[X l ] = Y J E[X ij \ln 

7=1 

n n 

= I 1 + ^Z a ^ n ~*~ ^2 W" 

7=1 7=1 

n 

= /I + ot; since 2, Pj — 

7=1 



Similarly, it follows that 



E\X d ] = n + fy 
E[XJ = fi 



Because the preceding is equivalent to 



E[XJ = tx 
E[X t , - X.] = ai 
E[X. } - XJ = Pj 
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we see that unbiased estimators of jjl, a;, fa — call them fi, a.i, fa — are given by 



M 


= x„ 




&i 


= X. 


-x. 


Pi 


= x r 


-x. 



EXAMPLE 10.4b The following data from Example 10.4a give the scores obtained when 
four different reading tests were given to each of five students. Use it to estimate the 
parameters of the model. 









Student 






Row Totals 




Examination 


1 


2 


3 


4 


5 


x. 


1 


75 


73 


60 


70 


86 


364 




72.8 


2 


78 


71 


64 


72 


90 


375 




75 


3 


80 


69 


62 


70 


85 


366 




73.2 


4 


73 


67 


63 


80 


92 


375 




75 


Column totals 


306 


280 


249 


292 


353 


1480 


<— grand total 


X -J 


76.5 


70 


62.25 


73 


88.25 


X. = 


1480 
20 


= 74 



SOLUTION The estimators are 



ft = 74 



a l =72.8-74 = -1.2 fa = 76.5 - 74 = 2.5 
6t 2 = 75 - 74 = 1 p 2 = 70 - 74 = -4 

6t 3 = 73.2 - 74 = -.8 /3 3 = 62.25 - 74 = -11.75 
a 4 = 75-74=l /? 4 = 73-74 = -l 

/3 5 = 88.25 -74= 14.25 



Therefore, for instance, if one of the students is randomly chosen and then given a randomly 
chosen examination, then our estimate of the mean score that will be obtained is jl = 74. 
If we were told that examination i was taken, then this would increase our estimate of the 
mean score by the amount «;; and if we were told that the student chosen was number 
j, then this would increase our estimate of the mean score by the amount fy. Thus, for 
instance, we would estimate that the score obtained on examination 1 by student 2 is the 
value of a random variable whose mean is jl + a\ + fa = 74 — 1.2 — 4 = 68.8. ■ 
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10.5 TWO-FACTOR ANALYSIS OF VARIANCE: 
TESTING HYPOTHESES 

Consider the two-factor model in which one has data Xij, i = 1, . . . , m andy = 1, . . . , n. 
These data are assumed to be independent normal random variables with a common 
variance a and with mean values satisfying 

E\Xij\ = ii + a; + fy 

where 

m n 

i=\ j=\ 

In this section, we will be concerned with testing the hypothesis 

Hq : all a; = 

against 

H\ : not all the or,- are equal to 

This null hypothesis states that there is no row effect, in that the value of a datum is not 
affected by its row factor level. 

We will also be interested in testing the analogous hypothesis for columns, that is 

Hq : all pj are equal to 

against 

H\ : not all Pj are equal to 

To obtain tests for the above null hypotheses, we will apply the analysis of variance 
approach in which two different estimators are derived for the variance a . The first will 
always be a valid estimator, whereas the second will only be a valid estimator when the null 
hypothesis is true. In addition, the second estimator will tend to overestimate a when 
the null hypothesis is not true. 

To obtain our first estimator of a , we start with the fact that 

m n m n 



is chi-square with nm degrees of freedom. If in the above expression we now 
replace the unknown parameters fi, a\,a^, . . . , a m , j5\, /?2, • • • , fi n by their estimators 
ft,ai,6i2, ■ ■ ■ ,Oi m , Pi, p2> ■ ■ ■ j Pn> then it turns out that the resulting expression will remain 
chi-square but will lose 1 degree of freedom for each parameter that is estimated. To deter- 
mine how many parameters are to be estimated, we must be careful to remember that 
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Yli=i a i = 12J=i Pj = 0- Since the sum of all the a; is equal to 0, it follows that once 
we have estimated m — 1 of the Ui then we have also estimated the final one. Hence, only 
m — \ parameters are to be estimated in order to determine all of the estimators a;. For 
the same reason, only n — 1 of the /3y need be estimated to determine estimators for all 
n of them. Because fi also must be estimated, we see that the number of parameters that 
need to be estimated is 1 + m — 1 + n — 1 = n + m — l.Asa result, it follows that 

m n 
2=1 y=l 

is a chi-square random variable with nm — [n + m — 1) = (« — \){m — 1) degrees of 
freedom. 

Since /x =X.,&; = JQ. — X__, fy —Xj— X,, it follows that /I + a/ +jfl- = X^+Xj— Xj 
thus, 

m n 

J2 J2 {x y ~ x >- ~ x j + x -) 2/(j2 ( 10 -5-i) 

i=\ j=\ 
is a chi-square random variable with (n — l)(m — 1) degrees of freedom. 

Definition 

The statistic SS e defined by 



ss e = J2 Yl&v - Xi - - x j + x - 



Aij — Ai, — Aj -|- A-.. 
i=\ 7=1 

is called the error sum of squares. 

If we think of the difference between a value and its estimated mean as being an "error," 
then SS e is equal to the sum of the squares of the errors. Since SS e lo is just the expression 
in 10.5.1, we see that SS e la is chi-square with (»— l)(m— 1) degrees of freedom. Because 
the expected value of a chi-square random variable is equal to its number of degrees of 
freedom, we have that 

E[SS e /a 2 ] = {n - \){m - 1) 
or 

E[SS e l{n - \){m - 1)] = a 2 
That is, 

SS e l{n - \){m - 1) 

is an unbiased estimator of a . 

Suppose now that we want to test the null hypothesis that there is no row effect — that 
is, we want to test 

Hq : all the a, are equal to 
against 

H\ : not all the <Xi are equal to 
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To obtain a second estimator of a , consider the row averages X{., i = 1, . . . , m. Note 
that, when Hq is true, each m is equal to 0, and so 

E[X{] = ix + ai = n 

Because each X ; is the average of n random variables, each having variance a , it follows 
that 

VartXJ = a 2 In 

Thus, we see that when Hq is true 

m m 

Y^i. - E[X.]) 2 /Var(X.) = n J](X, - ptflo 2 
i=\ i=\ 

will be chi-square with m degrees of freedom. If we now substitute X (the estimator of 
fi) for ix in the above, then the resulting expression will remain chi-square but with 1 less 
degree of freedom. We thus have the following: 

when Hq is true 

m 

nY^i.-X.flo 2 

i=\ 

is chi-square with m — 1 degrees of freedom. 

Definition 

The statistic SS r is defined by 

m 

SS r = nj_]( x i. ~ x -f 

i=\ 

and is called the row sum of squares. 

We saw earlier that when Hq is true, SS r /a is chi-square with m — 1 degrees of freedom. 
As a result, when Hq is true, 

E[SS r /a 2 ] = m-\ 

or, equivalently, 

E[SS r l{m - 1)] = a 2 

In addition, it can be shown that SS r l{m — 1) will tend to be larger than a 2 when Hq is 
not true. Thus, once again we have obtained two estimators of a 2 . The first estimator, 
SS e /(n — \){m— 1), is a valid estimator whether or not the null hypothesis is true, whereas 
the second estimator, SS r l{m — 1), is only a valid estimator of a 2 when Hq is true and 
tends to be larger than a when Hq is not true. 



1 0.5 Two-Factor Analysis of Variance: Testing Hypotheses 



461 



We base our test of the null hypothesis Hq that there is no row effect, on the ratio of 
the two estimators of a . Specifically, we use the test statistic 



TS = 



SS r l(m - 1) 
SS e l{n- \)(m- 1) 



Because the estimators can be shown to be independent when Hq is true, it follows that 
the significance level a test is to 

reject Hq if TS > F m _ l ^ n _ l ^ m _ l ) ta 
do not reject Hq otherwise 

Alternatively, the test can be performed by calculating the />-value. If the value of the test 
statistic is v, then the/<-value is given by 

/>-value = P{F m -i t ( n -i)( m -i) > v] 

A similar test can be derived for testing the null hypothesis that there is no column 
effect — that is, that all the fij are equal to 0. The results are summarized in Table 10.3. 
Program 10.5 will do the computations and give the/>-value. 

TABLE 10.3 Two-Factor ANOVA 





Sum 


of Squares 










Degrees of Freedom 


Row 

Column 

Error 


SS C = 

ss e = 




-x.) 

XJ 2 

{Xy - 


1 
-x, 


-Xj+XJ 2 




m - 

n — 

(«- 


- 1 
■ 1 

- \){m- 1) 


Null 
Hypothesis 


LetN= («- l){m- 
Test 
Statistic 


1) 


Significance 
Level a Test 




/>-value if 
TS=v 


All a; = 
All /}j = 




SS r l(m - 1) 
SSeIN 

SS c l{n - 1) 

SS e IN 






Reject if 
TS > F m . 
Reject if 
TS > F n . 


-\,N,a 
-\,N,a 




V{F n -i, N > v\ 



EXAMPLE 10.5a The following data* represent the number of different macroinvertebrate 
species collected at 6 stations, located in the vicinity of a thermal discharge, from 1970 to 
1977. 



* Taken from Wartz and Skinner, "A 12 year macroinvertebrate study in the vicinity of 2 thermal discharges to the 
Susquehanna River near York, Haven, PA." Jour, of Testing and Evaluation. Vol. 12. No. 3, May 1984, 157—163. 
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Station 


Year 


1 


2 


3 


4 


5 


6 


1970 


53 


35 


31 


37 


40 


43 


1971 


36 


34 


17 


21 


30 


18 


1972 


47 


37 


17 


31 


45 


26 


1973 


55 


31 


17 


23 


43 


37 


1974 


40 


32 


19 


26 


45 


37 


1975 


52 


42 


20 


27 


26 


32 


1976 


39 


28 


21 


21 


36 


28 


1977 


40 


32 


21 


21 


36 


35 



The p-values in a Two-Way ANOVA 





A 


B 


C 


D 


E 


F 


4 


1 


53 


35 


31 


37 


40 


43 




2 


36 


34 


17 


21 


30 


18 


3 


47 


37 


17 


31 


45 


26 


4 


55 


31 


17 


23 


43 


37 


5 


40 


32 


19 


26 


45 


37 


H I 


I- 


♦ 



Start 



Quit 



The value of the F-statistic for testing that there is no row effect is 3.72985 

The p-value for testing that there is no row effect is 0.00404 

The value of the F-statistic for testing that there is no column effect is 
22.47898 

The p-value for testing that there is no column effect is less than 0.0001 



FIGURE I0.2 



To test the hypotheses that the data are unchanging (a) from year to year, and (b) from 
station to station, run Program 10.5. Results are shown in Figure 10.2. Thus both the 
hypothesis that the data distribution does not depend on the year and the hypothesis that 
it does not depend on the station are rejected at very small significance levels. ■ 
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10.6 TWO-WAY ANALYSIS OF VARIANCE WITH 
INTERACTION 

In Sections 10.4 and 10.5, we considered experiments in which the distribution of the 
observed data depended on two factors — which we called the row and column factor. 
Specifically, we supposed that the mean value of X;j, the data value in row i and column 
j, can be expressed as the sum of two terms — one depending on the row of the element 
and one on the column. That is, we supposed that 



Xij ~ J\f(fi + at + Pj, ex 2 ), i = 1 , . 



,m, j = l,...,n 



However, one weakness of this model is that in supposing that the row and column effects 
are additive, it does not allow for the possibility of a row and column interaction. 

For instance, consider an experiment designed to compare the mean number of defective 
items produced by four different workers when using three different machines. In analyzing 
the resulting data, we might suppose that the incremental number of defects that resulted 
from using a given machine was the same for each of the workers. However, it is certainly 
possible that a machine could interact in a different manner with different workers. That 
is, there could be a worker-machine interaction that the additive model does not allow for. 

To allow for the possibility of a row and column interaction, let 

fly = E\Xjj\ 

and define the quantities fi, a;, fij, y^, i = 1, . . . , m,j = 1, . . . , n as follows: 

oii = ili, - H., 

Pj = M.y - M.. 

Yij = Vij ~ Hi. ~ M.y + M.. 

It is immediately apparent that 

IXij = fi + oii + fy + yij 
and it is easy to check that 

m n m n 

i=\ j=\ i=\ j=\ 

The parameter fi is the average of all nm mean values; it is called the grand mean. The 
parameter ai is the amount by which the average of the mean values of the variables in 
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row i exceeds the grand mean; it is called the effect of row i. The parameter Pj is the amount 
by which the average of the mean values of the variables in column j exceeds the grand 
mean; it is called the effect of column j . The parameter Yij = l^ij ~ (f 1 + &i + Pj) is the 
amount by which puj exceeds the sum of the grand mean and the increments due to row 
i and to column j; it is thus a measure of the departure from row and column additivity 
of the mean value fi;j, and is called the interaction of row i and column j. 

As we shall see, in order to be able to test the hypothesis that there are no row and column 
interactions — that is, that all Yij = — it is necessary to have more than one observation 
for each pair of factors. So let us suppose that we have / observations for each row and 
column. That is, suppose that the data are {X^, i = 1, . . . , m,j = 1, ...,», k = 1, . . . , /}, 
where X^ is the kth observation in row i and column j. Because all observations are 
assumed to be independent normal random variables with a common variance a , the 
model is 

Xijk ~ Mill + di + Pj + Yy, a 1 ) 

where 

m n m n 

£«, = £;# = £; y$ = £ ^ = o (10 - 6 - 1) 

i=\ 7=1 i=\ ;=1 

We will be interested in estimating the preceding parameters, and in testing the following 
null hypotheses: 

Hq : Q!; = 0, for all i 

H^ : Pj = 0, for all; 

H™* : Yij = 0, for all i,j 

That is, Hq is the hypothesis of no row effect; Hq is the hypothesis of no column effect; 
and Hq"' is the hypothesis of no row and column interaction. 

To estimate the parameters, note that it is easily verified from Equation 10.8 and the 
identity 

E[Xyk\ = (Aij = fi + a, r + fij + Yij 

that 

E[Xij] = fiij = ix + a, + Pj + Yij 
E[XiJ = ii + ai 
E\Xj} = ii + Pj 
E[XJ = [X 
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Therefore, with a "hat" over a parameter denoting the estimator of that parameter, we 
obtain from the preceding that unbiased estimators are given by 

A = X.. 

Pi — A/. — A. 

&i = X.. - X. . 

% = X;'. - fr- fy- a, = Xij. - X L . - Xj, + X.. 

To develop tests for the null hypotheses H™' , //q , and Hq , start with the fact that 
V^ -A -A (a# - M - «* - Pj ~ Yij) 2 

ELL — - 2 

k=\ j=l i=\ 

is a chi-square random variable with nml degrees of freedom. Therefore, 

Y^ y^ Y^ ' ^ ~ M ~ tti ~ $ ~ ^ 

will also be chi-square, but with 1 degree of freedom lost for each parameter that is 
estimated. Now, since X^ a « = 0, lt follows that m — 1 of the a ; need to be estimated; 
similarly, n—\ of the Pj need to be estimated. Also, since £^ Yij = ^- fty = 0, it follows 
that if we arrange all the Yij m a rectangular array having m rows and n columns, then all 
the row and column sums will equal 0, and so the values of the quantities in the last row 
and last column will be determined by the values of all the others; hence we need only 
estimate (m — \){n — 1) of these quantities. Because we also need to estimate fi, it follows 
that a total of 

n — I + m — 1 + (» — \){m — 1) + 1 = nm 
parameters needs to be estimated. Since 

jX + (Xi + Pj + Yij = Xij. 

it thus follows from the preceding that if we let 

I n m 

SSe = L L L ( % ~ % )2 

k=\ y=l i=\ 

then 

SS 

— - is chi-square with nm(l — 1) degrees of freedom 

a 1 

Therefore, 

— is an unbiased estimator of a 

nm\l — 1) 
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Suppose now that we want to test the hypothesis that there are no row and column 
interactions — that is, we want to test 

Ho*:yi/ = Q, i=\,...,m, j=\,...,n 

Now, if Hq* is true, then the random variables X^. will be normal with mean 

E[X l] } = l i + a i + p j 

Also, since each of these terms is the average of/ normal random variables having variance 
cr 2 , it follows that 

Var(%) = o 2 ll 

Hence, under the assumption of no interactions, 

" * l{Xjj, - [i - a, - fij) 2 

7=1 i=\ 

is a chi-square random variable with nm degrees of freedom. Since a total of 1 + m — 
I + n — I = n + m — lof the parameters \x, a/, i = 1, . . . , m, /3j,j = 1, . . . , n, must be 
estimated, it follows that if we let 

n m n m 

ss int = J2Y1 '(% -£-«.-- fa 2 = E E l{x v- - x >- - x j- + ^- )2 



= 1 !=1 7=1 ;=1 



then, under //q"', 



— — is chi-square with (n — \){m — 1) degrees of freedom 



Therefore, under the assumption of no interactions, 

is an unbiased estimator of a 2 



OOint 



(n — \){m — 1) 

Because it can be shown that, under the assumption of no interactions, SS e and SSi nt are 
independent, it follows that when H™' is true 

_ SS int l{n - \){m - 1) 
SS e lnm{l — 1) 

is an ^-random variable with [n — \){m — 1) numerator and nm(l — 1) denominator 
degrees of freedom. This gives rise to the following significance level a test of 
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Namely, 



trmt T SS intKn ~ \){m - 1) 

reject M it — — > ■r(»-i)(»»-i),»» > (/-i),a 

do not reject //q W otherwise 

Alternatively, we can compute the p- value. I£ Fj nt = v, then the/>-value of the test of the 
null hypothesis that all interactions equal is 

/(-value = P{F( n -i)( m -\) inm (i-i) > v) 

If we want to test the null hypothesis 

Hq : on — 0, i = l,...,m 

then we use the fact that when H' Q is true, Xi is the average of nl independent normal 
random variables, each with mean /x and variance a . Hence, under HZ, 

E[XjJ = //, Var(JQJ = o 2 lnl 






is chi-square with m degrees of freedom. Thus, if we let 



SS r = J2 nl{X t „ - A) 2 = J] nl{X u . - X„. 



i=\ i=\ 

then, when HZ is true, 



SS r 

— =- is chi-square with m — \ degrees of freedom 



SS r 
— is an unbiased estimator of a 



m — \ 



Because it can be shown that, under Hq , SS e and SS r are independent, it follows that 
SS r l{m - 1) . 



when Hq is true 



SS e /nm(l — 1) 



i an F m -\, nm{l — 1) random variable 



468 Chapter 10: Analysis of Variance 



Thus we have the following significance level a test of 

H r Q lalla,- = 



versus 



Namely, 



H[ : at least one a; 7^ 



SS r l{m - 1) 

reject H Q if — — — > F m -i, nm (i-i), a 

SS e lnm{L — 1) 

do not reject HZ otherwise 

a, 1 -r SS r l(.m-l) , 

Alternatively, it = v, then 

7 SS e lnm{l - 1) 

Rvalue = P[F m -i inm (i-i) > v] 

Because an analogous result can be shown to hold when testing Hq : all /3y = 0, we obtain 
the ANOVA information shown in Table 10.4. 

Note that all of the preceding tests call for rejection only when their related ^-statistic 
is large. The reason that only large (and not small) values call for rejection of the null 
hypothesis is that the numerator of the ^-statistic will tend to be larger when Hq is not 
true than when it is, whereas the distribution of the denominator will be the same whether 
or not Ho is true. 

Program 10.6 computes the values of the ^-statistics and their associated /(-values. 

EXAMPLE 10.6a The life of a particular type of generator is thought to be influenced by 
the material used in its construction and also by the temperature at the location where it 
is utilized. The following table represents lifetime data on 24 generators made from three 
different types of materials and utilized at two different temperatures. Do the data indicate 
that the material and the temperature do indeed affect the lifetime of a generator? Is there 
evidence of an interaction effect? 



Temperature 



Material 


10°C 


18°C 


1 


135, 150 


50,55 




176, 85 


64,38 


2 


150, 162 


76,88 




171, 120 


91,57 


3 


138, 111 


68,60 




140, 106 


74,51 



SOLUTION Run Program 10.6 (see Figures 10.3 and 10.4). 



TABLE 10.4 Two-way ANOVA with I Observations per Cell: N — nm(l — 1) 



Source of Degrees of 

Variation Freedom Sum of Squares ^-Statistic Level a Test p- Value if F= v 

, x — 1#» i SS r /(m — 1) 

Row m-\ SS r = lnY,.SXi.. -XJ Fr = Reject//^ P{F m -\,N > "} 



SS c l(n - 1) 



ifiy > F B _i«„ 



Column n—\ SS e = lm2_^._ (X,\ — X_) F c = — Reject Hq P{F n -\^ > v\ 

\IF C > F„_ hN>a 

t , ,,, ., cc ,sr^ n r SS int l(n - l)(m - 1) . ■ 

Interaction (n - \){m - 1) SS int = l^ =i F int = Reject H Q " P\F( n -\)(m-l),N > v i 

Em 2 

i=V X V- ~ X '- ~ X j- + X -> '"Fi„ t > F(n-\){m-l),N,a 

El x — \n 

*EZi {x 'j*- x >j- )2 
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H- 



Enter the number of rows: 



Enter the number of columns: 2 



Begin Data Entry 



Enter the number of 
observations in each cell: 



Quit 



FIGURE 10.3 



The p-values in a Two-way ANOVA with Possible Interaction 



Click on a cell to enter data 





A |B 




1 


135,150,176,85 ;50, 55, 64, 38 


2 


150, 162, 171,120 76, 88, 91,57 


3 


138,111,140,106:68,60,74,51 









Start 



Clear All Observations 



The value of the F-statistic for testing that there is no row effect is 2.47976 

The p-value for testing that there is no row effect is 0.1 093 

The value of the F-statistic for testing that there is no column effect is 69.63223 

The p-value for testing that there is no column effect is less than 0.0001 

The value of the F-statistic for testing that there is no interaction effect is 0.64625 

The p-value for testing that there is no interaction effect is 0.5329 



FIGURE I0.4 
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Problems 

A purification process for a chemical involves passing it, in solution, through a 
resin on which impurities are adsorbed. A chemical engineer wishing to test the 
efficiency of 3 different resins took a chemical solution and broke it into 1 5 batches. 
She tested each resin 5 times and then measured the concentration of impurities 
after passing through the resins. Her data were as follows: 



Concentration of Impurities 


Resin I 


Resin II 


Resin III 


.046 


.038 


.031 


.025 


.035 


.042 


.014 


.031 


.020 


.017 


.022 


.018 


.043 


.012 


.039 



Test the hypothesis that there is no difference in the efficiency of the resins. 

2. We want to know what type of filter should be used over the screen of a cathode- 
ray oscilloscope in order to have a radar operator easily pick out targets on the 
presentation. A test to accomplish this has been set up. A noise is first applied to 
the scope to make it difficult to pick out a target. A second signal, representing the 
target, is put into the scope, and its intensity is increased from zero until detected 
by the observer. The intensity setting at which the observer first notices the target 
signal is then recorded. This experiment is repeated 20 times with each filter. The 
numerical value of each reading listed in the table of data is proportional to the 
target intensity at the time the operator first detects the target. 



Filter No. 1 


Filter No. 2 


Filter No. 3 


90 


88 


95 


87 


90 


95 


93 


97 


89 


96 


87 


98 


94 


90 


96 


88 


96 


81 


90 


90 


92 


84 


90 


79 


101 


100 


105 


96 


93 


98 


90 


95 


92 



(continued) 
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Filter No. 1 


Filter No. 2 


Filter No. 3 


82 


86 


85 


93 


89 


97 


90 


92 


90 


96 


98 


87 


87 


95 


90 


99 


102 


101 


101 


105 


100 


79 


85 


84 


98 


97 


102 



Test, at the 5 percent level of significance, the hypothesis that the filters are the 
same. 

3. Explain why we cannot efficiently test the hypothesis Hq : /Xi = \X% = ■ ■ ■ = fi m 
by running f-tests on all of the (™) pairs of samples. 

4. A machine shop contains 3 ovens that are used to heat metal specimens. Subject to 
random fluctuations, they are all supposed to heat to the same temperature. To test 
this hypothesis, temperatures were noted on 15 separate heatings. The following 
data resulted. 



Oven Temperature 

1 492.4, 493.6, 498.5, 488.6, 494 

2 488.5, 485.3, 482, 479.4, 478 

3 502.1,492,497.5,495.3,486.7 

Do the ovens appear to operate at the same temperature? Test at the 5 percent 
level of significance. What is the/>-value? 

5. Four standard chemical procedures are used to determine the magnesium content 
in a certain chemical compound. Each procedure is used four times on a given 
compound with the following data resulting. 



Method 



1 



76.42 80.41 74.20 86.20 

78.62 82.26 72.68 86.04 

80.40 81.15 78.84 84.36 

78.20 79.20 80.32 80.68 



Do the data indicate that the procedures yield equivalent results? 
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6. Twenty overweight individuals, each more than 40 pounds overweight, were ran- 
domly assigned to one of two diets. After 10 weeks, the total weight losses (in 
pounds) of the individuals on each of the diets were as follows: 



Weight Loss 



Diet 1 Diet 2 



22.2 


24.2 


23.4 


16.8 


24.2 


14.6 


16.1 


13.7 


9.4 


19.5 


12.5 


17.6 


18.6 


11.2 


32.2 


9.5 


8.8 


30.1 


7.6 


21.5 



Test, at the 5 percent level of significance, the hypothesis that the two diets have 
equal effect. 

7. In a test of the ability of a certain polymer to remove toxic wastes from water, exper- 
iments were conducted at three different temperatures. The data below give the 
percentages of the impurities that were removed by the polymer in 2 1 independent 
attempts. 

Low Temperature Medium Temperature High Temperature 

42 36 33 

41 35 44 

37 32 40 

29 38 36 

35 39 44 

40 42 37 

32 34 45 

Test the hypothesis that the polymer performs equally well at all three temperatures. 
Use the (a) 5 percent level of significance and (b) 1 percent level of significance. 

8. In the one-factor analysis of variance model with n observations per sample, let Sf, 
i = 1, . . . , m denote the sample variances for the m samples. Show that 



SS w = (n-l)J2 S ? 
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9. The following data relate to the ages at death of a certain species of rats that were 
fed 1 of 3 types of diet. Thirty rats of a type having a short life span were randomly 
divided into 3 groups of size 10 each. The sample means and sample variances of 
the ages at death (measured in months) of the 3 groups are as follows: 

Very Low Calorie Moderate Calorie High Calorie 

Sample mean 22.4 16.8 13.7 

Sample variance 24.0 23.2 17.1 

Test the hypothesis, at the 5 percent level of significance, that the mean lifetime 
of a rat is not affected by its diet. What about at the 1 percent level? 

10. Plasma bradykininogen levels are related to the body's ability to resist inflamma- 
tion. In a 1968 study (Eilam, N., Johnson, P. K., Johnson, N. L., and Creger, W., 
"Bradykininogen levels in Hodgkin's disease," Cancer, 22, pp. 631-634), levels 
were measured in normal patients, in patients with active Hodgkin's disease, and 
in patients with inactive Hodgkin's disease. The following data (in micrograms of 
bradykininogen per milliliter of plasma) resulted. 

Normal Active Hodgkin's Disease Inactive Hodgkin's Disease 

5.37 3.96 5.37 
5.80 3.04 10.60 
4.70 5.28 5.02 
5.70 3.40 14.30 
3.40 4.10 9.90 
8.60 3.61 4.27 

7.48 6.16 5.75 
5.77 3.22 5.03 
7.15 7.48 5.74 

6.49 3.87 7.85 
4.09 4.27 6.82 
5.94 4.05 7.90 

6.38 2.40 8.36 

Test, at the 5 percent level of significance, the hypothesis that the mean 
bradykininogen levels are the same for all three groups. 

11. A study of the trunk flexor muscle strength of 75 girls aged 3 to 7 was reported 
by Baldauf, K., Swenson, D., Medeiros, J., and Radtka, S., "Clinical assessment 
of trunk flexor muscle strength in healthy girls 3 to 7," Physical Therapy, 64, pp. 
1203-1208, 1984. With muscle strength graded on a scale of to 5, and with 15 
girls in each age group, the following sample means and sample standard deviations 
resulted. 
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Age 3 4 5 6 7 

Sample mean 3.3 3.7 4.1 4.4 4.8 

Sample standard deviation .9 1.1 1.1 .9 .5 



Test, at the 5 percent level of significance, the hypothesis that the mean trunk 
flexor strength is the same for all five age groups. 

12. An emergency room physician wanted to know whether there were any differences 
in the amount of time it takes for three different inhaled steroids to clear a mild 
asthmatic attack. Over a period of weeks she randomly administered these steroids 
to asthma sufferers, and noted the time it took for the patients' lungs to become 
clear. Afterward, she discovered that 12 patients had been treated with each type 
of steroid, with the following sample means (in minutes) and sample variances 
resulting. 



Steroid 


x t 


sf 


A 


32 


145 


B 


40 


138 


C 


30 


150 



(a) Test the hypothesis that the mean time to clear a mild asthmatic attack is the 
same for all three steroids. Use the 5 percent level of significance. 

(b) Find confidence intervals for all quantities /li, — fij that, with 95 percent 
confidence, are valid. 

13. Five servings each of three different brands of processed meat were tested for fat 
content. The following data (in fat percentage per gram) resulted. 



Brand 


1 


2 


3 




32 


41 


36 


Fat 


34 


32 


37 


content 


31 


33 


30 




35 


29 


28 




33 


35 


33 



(a) Does the fat content differ depending on the brand? 

(b) Find confidence intervals for all quantities /Lt ; - — fij that, with 95 percent 
confidence, are valid. 

14. A nutritionist randomly divided 15 bicyclists into 3 groups of 5 each. The first 
group was given a vitamin supplement to take with each of their meals during the 
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next 3 weeks. The second group was instructed to eat a particular type of high-fiber 
whole-grain cereal for the next 3 weeks. The final group was instructed to eat as 
they normally do. After the 3-week period elapsed, the nutritionist had each of the 
bicyclists ride 6 miles. The following times were recorded. 

Vitamin group: 15.6 16.4 17.2 15.5 16.3 

Fiber cereal group: 17.1 16.3 15.8 16.4 16.0 
Control group: 15.9 17.2 16.4 15.4 16.8 

(a) Are the data consistent with the hypothesis that neither the vitamin nor 
the fiber cereal affected the bicyclists' speeds? Use the 5 percent level of 
significance. 

(b) Find confidence intervals for all quantities fi; — fij that, with 95 percent 
confidence, are valid. 

15. Test the hypothesis that the following three independent samples all come from 
the same normal probability distribution. 



Sample 1 


Sample 2 


Sample 3 


35 


29 


44 


37 


38 


52 


29 


34 


56 


27 


30 




30 


32 





16. For data Xij, i = 1, . . . , m,j = !,...,», show that 



i=\ 7=1 



17. If Xy = i +j, determine 



« £,- =1 E,- =1 ** 

18. If Xij = di + bj, show that 



E E x 'j = n E a * ~*~ m E fy 

1=1 j=\ 1=1 j=\ 

19. A study has been made on pyrethrum flowers to determine the content of pyrethrin, 
a chemical used in insecticides. Four methods of extracting the chemical are used, 
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and samples are obtained from flowers stored under three conditions: fresh flowers, 
flowers stored for 1 year, and flowers stored for 1 year but treated. It is assumed 
that there is no interaction present. The data are as follows: 

Pyrethrin Content, Percent 



Storage 
Condition 




Method 




A 


B 


C 


D 


1 


1.35 


1.13 


1.06 


.98 


2 


1.40 


1.23 


1.26 


1.22 


3 


1.49 


1.46 


1.40 


1.35 



Suggest a model for the preceding information, and use the data to estimate its 
parameters. 

20. The following data refer to the number of deaths per 10,000 adults in a large 
Eastern city in the different seasons for the years 1982 to 1986. 



Year 


Winter 


Spring 


Summer 


Fall 


1982 


33.6 


31.4 


29.8 


32.1 


1983 


32.5 


30.1 


28.5 


29.9 


1984 


35.3 


33.2 


29.5 


28.7 


1985 


34.4 


28.6 


33.9 


30.1 


1986 


37.3 


34.1 


28.5 


29.4 



(a) Assuming a two-factor model, estimate the parameters. 

(b) Test the hypothesis that death rates do not depend on the season. Use the 
5 percent level of significance. 

(c) Test, at the 5 percent level of significance, the hypothesis that there is no 
effect due to the year. 

21. For the model of Problem 19: 

(a) Do the methods of extraction appear to differ? 

(b) Do the storage conditions affect the content? Test at the a = .05 level of 
significance. 

22. Three different washing machines were employed to test four different detergents. 
The following data give a coded score of the effectiveness of each washing. 

(a) Estimate the improvement in mean value when using detergent 1 over using 
detergents (i) 2; (ii) 3; (iii) 4. 

(b) Estimate the improvement in mean value when using machine 3 as opposed 
to using machine (i) 1; (ii) 2. 



478 Chapter 10: Analysis of Variance 



Machine 



1 



Detergent 1 53 50 59 

Detergent 2 54 54 60 

Detergent 3 56 58 62 

Detergent 4 50 45 57 



(c) Test the hypothesis that the detergent used does not affect the score. 

(d) Test the hypothesis that the machine used does not affect the score. 

Use, in both (c) and (d), the 5 percent level of significance. 

23. An experiment was devised to test the effects of running 3 different types of gasoline 
with 3 possible types of additive. The experiment called for 9 identical motors to 
be run with 5 gallons for each of the pairs of gasoline and additives. The following 
data resulted. 



Mileage Ob 


tained 










Additive 




Gasoline 


1 


2 


3 


1 
2 
3 


124.1 
126.4 
127.2 


131.5 
130.6 

132.7 


127 

128.4 

125.6 



(a) Test the hypothesis that the gasoline used does not affect the mileage. 

(b) Test the hypothesis that the additives are equivalent. 

(c) What assumptions are you making? 

24. Suppose in Problem 6 that the 10 people placed on each diet consisted of 5 men 
and 5 women, with the following data. 





Diet 1 


Diet 2 


Women 


7.6 


19.5 




8.8 


17.6 




12.5 


16.8 




16.1 


13.7 




18.6 


21.5 


Men 


22.2 


30.1 




23.4 


24.2 




24.2 


9.5 




32.2 


14.6 




9.4 


11.2 
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(a) Test the hypothesis that there is no interaction between gender and diet. 

(b) Test the hypothesis that the diet has the same effect on men and women. 

25. A researcher is interested in comparing the breaking strength of different laminated 
beams made from 3 different types of glue and 3 varieties of wood. To make the 
comparison, 5 beams of each of the 9 combinations were manufactured and then 
put under a stress test. The following table indicates the pressure readings at which 
each of the beams broke. 



N. Glue 














Wood\ 


Gl 




G2 


G2 






196 


208 


214 


216 


258 


250 


W\ 


247 
221 


216 


235 
252 


240 


264 

272 


248 




216 


228 


215 


217 


246 


247 


W2 


240 
236 


224 


235 

241 


219 


261 
255 


250 




230 


242 


212 


218 


255 


251 


W3 


232 
228 


244 


216 

222 


224 


261 
247 


258 



(a) Test the hypothesis that the wood and glue effect is additive. 

(b) Test the hypothesis that the wood used does not affect the breaking strength. 

(c) Test the hypothesis that the glue used does not affect the breaking strength. 

26. A study was made as to how the concentration of a certain drug in the blood, 
24 hours after being injected, is influenced by age and gender. An analysis of the 
blood samples of 40 people given the drug yielded the following concentrations 
(in milligrams per cubic centimeter). 







Age Group 






11-25 


26-40 


41-65 


Over 65 


Male 


52 


52.5 


53.2 


82.4 




56.6 


49.6 


53.6 


86.2 




68.2 


48.7 


49.8 


101.3 




82.5 


44.6 


50.0 


92.4 




85.6 


43.4 


51.2 


78.6 


Female 


68.6 


60.2 


58.7 


82.2 




80.4 


58.4 


55.9 


79.6 




86.2 


56.2 


56.0 


81.4 




81.3 


54.2 


57.2 


80.6 




77.2 


61.1 


60.0 


82.2 
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(a) Test the hypothesis of no age and gender interaction. 

(b) Test the hypothesis that gender does not affect the blood concentration. 

(c) Test the hypothesis that age does not affect blood concentration. 

27. Suppose, in Problem 23, that there has been some controversy about the assump- 
tion of no interaction between gasoline and additive used. To allow for the 
possibility of an interaction effect between gasoline and additive, it was decided to 
run 36 motors — 4 in each grouping. The following data resulted. 







Additive 




Gasoline 


1 


2 


3 


1 


126.2 


130.4 


127 




124.8 


131.6 


126.6 




125.3 


132.5 


129.4 




127.0 


128.6 


130.1 


2 


127.2 


142.1 


129.5 




126.6 


132.6 


142.6 




125.8 


128.5 


140.5 




128.4 


131.2 


138.7 


3 


127.1 


132.3 


125.2 




128.3 


134.1 


123.3 




125.1 


130.6 


122.6 




124.9 


133.0 


120.9 



(a) Do the data indicate an interaction effect? 

(b) Do the gasolines appear to give equal results? 

(c) Test whether or not there is an additive effect or whether all additives work 
equally well. 

(d) What conclusions can you draw? 

28. An experiment has been devised to test the hypothesis that an elderly person's 
memory retention can be improved by a set of "oxygen treatments." A group of 
scientists administered these treatments to men and women. The men and women 
were each randomly divided into 4 groups of 5 each, and the people in the 2th 
group were given treatments over an (i — 1) week interval, i = 1,2,3,4. (The 
2 groups not given any treatments served as "controls.") The treatments were set 
up in such a manner that all individuals thought they were receiving the oxygen 
treatments for the total 3 weeks. After treatment ended, a memory retention 
test was administered. The results (with higher scores indicating higher memory 
retentions) are shown in the table. 

(a) Test whether or not there is an interaction effect. 

(b) Test the hypothesis that the length of treatment does not affect memory 
retention. 
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(c) Is there a gender difference? 

(d) A randomly chosen group of 5 elderly men, without receiving any oxygen 
treatment, were given the memory retention test. Their scores were 37, 35, 
33, 39, 29. What conclusions can you draw? 

Scores 





Number 


of Weeks of 


Oxygen 


Treatment 







1 


2 


3 


Men 


42 


39 


38 


42 




54 


52 


50 


55 




46 


51 


47 


39 




38 


50 


45 


38 




51 


47 


43 


51 


Women 


49 


48 


27 


61 




44 


51 


42 


55 




50 


52 


47 


45 




45 


54 


53 


40 




43 


40 


58 


42 



29. In a study of platelet production, 16 rats were put at an altitude of 15,000 feet, 
while another 16 were kept at sea level (Rand, K., Anderson, T., Lukis, G., and 
Creger, W., "Effect of hypoxia on platelet level in the rat," Clinical Research, 18, 



Altitude 



Sea Level 



i Removed 


Normal Spleen 


528 


434 


444 


331 


338 


312 


342 


575 


338 


472 


331 


444 


288 


575 


319 


384 


294 


272 


254 


275 


352 


350 


241 


350 


291 


466 


175 


388 


241 


425 


238 


344 
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p. 178, 1970). Half of the rats in both groups had their spleens removed. The 
fibrinogen levels on day 2 1 are reported below. 

(a) Test the hypothesis that there are no interactions. 

(b) Test the hypothesis that there is no effect due to altitude. 

(c) Test the hypothesis that there is no effect due to spleen removal. In all cases, 
use the 5 percent level of significance. 

Suppose that fi,ai,..., a m , f}\,...,fi n and pJ , a\, . . . , a' m , P[,...,p' n are such 
that 



Show that 



IX + a; + fij = n' + a\ + Pj for all i,j 



ji = i± , a, = a' t , Pj = Pj 



for all i and/. This shows that the parameters /x, a\, . . . , a m , P\,. . . ,p n in our 
representation of two factor ANOVA are uniquely determined. 




GOODNESS OF FIT TESTS AND 
CATEGORICAL DATA ANALYSIS 



I I.I INTRODUCTION 

We are often interested in determining whether or not a particular probabilistic model 
is appropriate for a given random phenomenon. This determination often reduces to 
testing whether a given random sample comes from some specified, or partially specified, 
probability distribution. For example, we may a priori feel that the number of industrial 
accidents occurring daily at a particular plant should constitute a random sample from 
a Poisson distribution. This hypothesis can then be tested by observing the number of 
accidents over a sequence of days and then testing whether it is reasonable to suppose 
that the underlying distribution is Poisson. Statistical tests that determine whether a given 
probabilistic mechanism is appropriate are called goodness of fit tests. 

The classical approach to obtaining a goodness of fit test of a null hypothesis that 
a sample has a specified probability distribution is to partition the possible values of the 
random variables into a finite number of regions. The numbers of the sample values 
that fall within each region are then determined and compared with the theoretical 
expected numbers under the specified probability distribution, and when they are signifi- 
cantly different the null hypothesis is rejected. The details of such a test are presented 
in Section 1 1 .2, where it is assumed that the null hypothesis probability distribution is 
completely specified. In Section 1 1.3, we show how to do the analysis when some of the 
parameters of the null hypothesis distribution are left unspecified; that is, for instance, the 
null hypothesis might be that the sample distribution is a normal distribution, without 
specifying the mean and variance of this distribution. In Sections 11.4 and 11.5, we con- 
sider situations where each member of a population is classified according to two distinct 
characteristics, and we show how to use our previous analysis to test the hypothesis that 
the characteristics of a randomly chosen member of the population are independent. As 
an application, we show how to test the hypothesis that m population all have the same 
discrete probability distribution. Finally, in the optional section, Section 11.6, we return 

483 
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to the problem of testing that sample data come from a specified probability distribution, 
which we now assume is continuous. Rather than discretizing the data so as to be able to 
use the test of Section 1 1 .2, we treat the data as given and make use of the Kolmogorov— 
Smirnov test. 

1 1.2 GOODNESS OF FIT TESTS WHEN ALL PARAMETERS 
ARE SPECIFIED 

Suppose that n independent random variables — Y\,...,Y n , each taking on one of the 
values 1, 2, . . . , k — are to be observed and we are interested in testing the null hypothesis 
that {pi, i = 1, . . . , k} is the probability mass function of the Yj. That is, if Y represents 
any of the Yj, then the null hypothesis is 

Ho : P{Y = /} =p h i=\,...,k 

whereas the alternative hypothesis is 

H\ : P{ Y = i} 7^ pi, for same i — 1, . . . , k 

To test the foregoing hypothesis, let Xi, i = 1, . . . , k, denote the number of the Yj's that 
equal i. Then as each Yj will independently equal i with probability P{Y = «'}, it follows 
that, under H$,Xi is binomial with parameters n and pi. Hence, when H$ is true, 

E[Xi\ = np, 

and so {Xi — npi) will be an indication as to how likely it appears that/>/ indeed equals 
the probability that Y = i. When this is large, say, in relationship to npi, then it is an 
indication that Hq is not correct. Indeed such reasoning leads us to consider the following 
test statistic: 

T = y {Xi ~ nhf (n.2.1) 

and to reject the null hypothesis when T is large. 

To determine the critical region, we need first specify a significance level a and then 
we must determine that critical value c such that 

P Ho {T>c}=a 

That is, we need determine c so that the probability that the test statistic T is at least as 
large as c, when Hq is true, is a. The test is then to reject the hypothesis, at the a level of 
significance, when T > c and to accept when T < c. 

It remains to determine c. The classical approach to doing so is to use the result that 
when n is large T will have, when Hq is true, approximately (with the approximation 
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becoming exact as n approaches infinity) a chi-square distribution with k — 1 degrees of 
freedom. Hence, for n large, c can be taken to equal X& k-v anc ^ so t ^ le a PP rox i mate a 
level test is 



reject H if T > x^_ : 

accept //q otherwise 



If the observed value of T is T = t, then the preceding test is equivalent to rejecting Hq 
if the significance level a is at least as large as the p-vaiue given by 



/»-value = Ph {T > t} 

where xl-\ ls a chi-square random variable with k — 1 degrees of freedom. 

An accepted rule of thumb as to how large n need be for the foregoing to be a good 
approximation is that it should be large enough so that npi > 1 for each i, i — l,...,k, 
and also at least 80 percent of the values npi should exceed 5. 

REMARKS 

(a) A computationally simpler formula for T can be obtained by expanding the square 
in Equation 11.2.1 and using the results that ^2,: pi = 1 and ^,;Xi = n (why is this 
true?): 



A X 2 - 2np i X l + n 2 p 2 
T = J2 ^" LJ — (1 1.2.2) 



= £^/« A - 2 £> + *£> 

i i i 



(b) The intuitive reason why T, which depends on the k values X\, . . . , X/g, has only k — 1 
degrees of freedom is that 1 degree of freedom is lost because of the linear relationship 
HiXi = n. 

(c) Whereas the proof that, asymptotically, T has a chi-square distribution is advanced, it 
can be easily shown when k = 2. In this case, since X\ + X2 = n, and/>i +p2 — 1> we 
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that 



T = 



(X l -np l ) 2 (X 2 -np 2 ) 2 



np\ np2 

(Xt-nptf (» - X x - nil - p x \f 



+ 



np\ n{\ -pi) 

(Xx-nptf | {Xx-npif 
np\ n(l-pi) 

(Xx-nptf 11 

since ' 



npi(l-pi) ' p \-p p(l-p) 

However, X\ is a binomial random variable with mean np\ and variance np\{\ —pi) 
and thus, by the normal approximation to the binomial, it follows that (X\ — np\)l 
Jnp\(l —pi) has, for large n, approximately a unit standard distribution; and so its 
square has approximately a chi-square distribution with 1 degree of freedom. 

EXAMPLE 1 1.2a In recent years, a correlation between mental and physical well-being 
has increasingly become accepted. An analysis of birthdays and death days of famous 
people could be used as further evidence in the study of this correlation. To use these 
data, we are supposing that being able to look forward to something betters a person's 
mental state; and that a famous person would probably look forward to his or her birth- 
day because of the resulting attention, affection, and so on. If a famous person is in 
poor health and dying, then perhaps anticipating his birthday would "cheer him up and 
therefore improve his health and possibly decrease the chance that he will die shortly 
before his birthday." The data might therefore reveal that a famous person is less likely 
to die in the months before his or her birthday and more likely to die in the months 
afterward. 

SOLUTION To test this, a sample of 1,251 (deceased) Americans was randomly chosen 
from Who Was Who in America, and their birth and death days were noted. (The data 
are taken from D. Phillips, "Death Day and Birthday: An Unexpected Connection," in 
Statistics: A Guide to the Unknown, Holden-Day, 1972.) The data are summarized in 
Table 11.1. 

If the death day does not depend on the birthday, then it would seem that each of the 
1,251 individuals would be equally likely to fall in any of the 12 categories. Thus, let us 
test the null hypothesis 



1 
Hq = pi = — , i=l,..., 12 



TABLE ll.l Number of Deaths Before, 


During, and After the Birth Month 
















6 5 

Months Months 

Before Before 


4 3 2 

Months Months Months 

Before Before Before 


1 
Month 
Before 


The 
Months 


1 
Month 

After 


2 
Months 

After 


3 
Months 

After 


4 
Months 

After 


5 
Months 

After 


Number of 

deaths 90 100 


87 96 101 


86 


119 


118 


121 


114 


113 


106 



n- 1,251 
m/12 = 104.25 



Chapter 1 1 : Goodness of Fit Tests and Categorical Data Analysis 



Sir 



1, 251/12 = 104. 25, the chi-square test statistic for this hypothesis is 



T = (90) 2 + (100) 2 + (87) 2 + --- + (106) 2 _ i 



104.25 



= 17.192 



The^>-value is 



p-value ^ P{xn > 17.192} 

= 1 - .8977 = .1023 by Program 5.8.1a 

The results of this test leave us somewhat up in the air about the hypothesis that an 
approaching birthday has no effect on an individual's remaining lifetime. For whereas the 
data are not quite strong enough (at least, at the 10 percent level of significance) to reject 
this hypothesis, they are certainly suggestive of its possible falsity. This raises the possibility 
that perhaps we should not have allowed as many as 12 data categories, and that we might 
have obtained a more powerful test by allowing for a fewer number of possible outcomes. 
For instance, let us determine what the result would have been if we had coded the data 
into 4 possible outcomes as follows: 

outcome 1 = —6, —5, —4 
outcome 2 = —3, —2, — 1 
outcome 3 = 0, 1,2 
outcome 4 = 3, 4, 5 

That is, for instance, an individual whose death day occurred 3 months before his or her 
birthday would be placed in outcome 2. With this classification, the data would be as 
follows: 



Outcome 



Number of 
Times Occurring 



1 


277 


2 


283 


3 


358 


4 


333 


n= 1,251 




nIA = 312.75 
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The test statistic for testing Ho = pi = 1/4, i = 1, 2, 3, 4 is 

T = (277) 2 + (283) 2 + (358) 2 + (333) 2 _ { 
312.75 
= 14.775 

Hence, as Xq\ 3 = 1 1-345, the null hypothesis would be rejected even at the 1 percent level 
of significance. Indeed, using Program 5.8. la yields that 

Rvalue % P{xl > 14.775} = 1 - .998 = .002 

The foregoing analysis is, however, subject to the criticism that the null hypothesis 
was chosen after the data were observed. Indeed, while there is nothing incorrect about 
using a set of data to determine the "correct way" of phrasing a null hypothesis, the 
additional use of those data to test that very hypothesis is certainly questionable. Therefore, 
to be quite certain of the conclusion to be drawn from this example, it seems prudent 
to choose a second random sample — coding the values as before — and again test 
H : pi = 1/4, i = 1, 2, 3, 4 (see Problem 3). ■ 

Program 1 1.2.1 can be used to quickly calculate the value of T. 

EXAMPLE 1 1 .2b A contractor who purchases a large number of fluorescent lightbulbs has 
been told by the manufacturer that these bulbs are not of uniform quality but rather have 
been produced in such a way that each bulb produced will, independently, either be of 
quality level A, B, C, D, or E, with respective probabilities .15, .25, .35, .20, .05. However, 
the contractor feels that he is receiving too many type E (the lowest quality) bulbs, and 
so he decides to test the producer's claim by taking the time and expense to ascertain the 
quality of 30 such bulbs. Suppose that he discovers that of the 30 bulbs, 3 are of quality 
level A, 6 are of quality level B, 9 are of quality level C, 7 are of quality level D, and 5 are of 
quality level E. Do these data, at the 5 percent level of significance, enable the contractor 
to reject the producer's claim? 

SOLUTION Program 11.2.1 gives the value of the test statistic as 9.348. Therefore, 

Rvalue = Ph q {T > 9.348} 
« Hxl > 9-348} 
= 1 — .947 from Program 5.8.1a 
= .053 

Thus the hypothesis would not be rejected at the 5 percent level of significance (but since 
it would be rejected at all significance levels above .053, the contractor should certainly 
remain skeptical). ■ 
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1 1 .2. 1 Determining the Critical Region by Simulation 

From 1900 when Karl Pearson first showed that T has approximately (becoming exact as 
n approaches infinity) a chi-square distribution with k — 1 degrees of freedom, until quite 
recently, this approximation was the only means available for determining the Rvalue of 
the goodness of fit test. However, with the recent advent of inexpensive, fast, and easily 
available computational power a second, potentially more accurate, approach has become 
available: namely, the use of simulation to obtain to a high level of accuracy the p-value 
of the test statistic. 

The simulation approach is as follows. First, the value of T is determined — say, T = t. 
Now to determine whether or not to accept Hq, at a given significance level a, we need to 
know the probability that T would be at least as large as t when Hq is true. To determine 
this probability, we simulate n independent random variables Y x , . . . , Y^ ' each having 
the probability mass function {pi, i = 1, . . . , k] — that is, 

P{F. (1) = /} =p h i=\,...,k, j=\,...,n 



J 



Now let 



and set 



X (1) = number / : Y W = i 



r U) = ^ 



(X (1) - n Pi f 



i-1 



npi 



Now repeat this procedure by simulating a second set, independent of the first set, of n 
independent random variables F, , . . . , Y£ 2 > each having the probability mass function 
{pi, i = 1, . . . , k] and then, as for the first set, determining T^ 2 ' . Repeating this a large 
number, say, r, of times yields r independent random variables T"\ 7*' 2 ' , . . . , 7*' r \ each 
of which has the same distribution as does the test statistic T when Hq is true. Hence, by 
the law of large numbers, the proportion of the T; that are as large as t will be very nearly 
equal to the probability that T is as large as t when Hq is true — that is, 

number / : 7* (/) > t r „ 

— « Ph {T > t] 

r 

In fact, by letting r be large, the foregoing can be considered to be, with high probability, 
almost an equality. Hence, if that proportion is less than or equal to a, then the^-value, 
equal to the probability of observing alas large as t when Hq is true, is less than a and 
so Hq should be rejected. 
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REMARKS 

(a) To utilize the foregoing simulation approach to determine whether or not to accept 
Hq when T is observed, we need to specify how one can simulate, or generate, a random 
variable Y such that P{Y = i} =pi,i= 1, . . . , k. One way is as follows: 

Step 1: Generate a random number U . 
Step 2: If 

pi H Vpi-i <U<pi-\ Ypi 

set Y = i (where p\ + ■ • ■ + />/-i = when i = 1). That is, 

U <pi=$> Y= 1 

pi < U <p\ +p 2 =* Y = 2 



pi-\ \-pi-i <U<pi-\ Ypi =>• Y = i 

p\-\ Vpn-i <U =$ Y = n 

Since a random number is equivalent to a uniform (0, 1) random variable, we have that 
P{a < U < b\ = b — a, < a < b < 1 

and so 

P{Y = i) =P\p x + --- +pi-i < U <pi+- ■■ +pi] =pi 

(b) A significant question that remains is how many simulation runs are necessary. It has 
been shown that the value r = 100 is usually sufficient at the conventional 5 percent level 
of significance.* 

EXAMPLE 1 1 .2c Let us reconsider the problem presented in Example 1 1 .2b. A simulation 
study yielded the result 

Ph {T < 9.52381} = .95 

and so the critical value should be 9.52381, which is remarkably close to xi 4 = 9.488 
given as the critical value by the chi-square approximation. This is most interesting since 
the rule of thumb for when the chi-square approximation can be applied — namely, that 



* See Hope, A., "A Simplified Monte Carlo Significance Test Procedure,"/, of Royal Statist. Soc, B. 30, 582-598, 
1968. 
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each npi > 1 and at least 80 percent of the npi exceed 5 — does not apply, thus raising 
the possibility that it is rather conservative. ■ 

Program 1 1 .2.2 can be utilized to determine the />-value. 

To obtain more information as to how well the chi-square approximation performs, 
consider the following example. 

EXAMPLE 1 1. 2d Consider an experiment having six possible outcomes whose prob- 
abilities are hypothesized to be .1, .1, .05, .4, .2, and. 15. This is to be tested by performing 
40 independent replications of the experiment. If the resultant number of times that each 
of the six outcomes occurs is 3, 3, 5, 18, 4, 7, should the hypothesis be accepted? 

SOLUTION A direct computation, or the use of Program 11.2.1, yields that the value of the 
test statistic is 7.4167. Utilizing Program 5.8.1a gives the result that 



P{xl < 7.4167} = .8088 



and t 



p-value « .1912 

To check the foregoing approximation, we ran Program 1 1.2.2, using 10,000 simulation 
runs, and obtained an estimate of the p- value equal to .1843 (see Figure 11.1). 
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the p-value in the goodness of fit test. 
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Enter value for p: 
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0.05 
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0.2 




Start 


















Add This Point To List 




Quit 








V^BV^H 
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Clear List 
















Enter sample size: 


40 




Enter desired number 
of simulation runs: 




10000 






Enter the value of the 
test statistic 


7.416667 






The estimate of the p-value is 0.1843 



FIGURE I I.I 
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Since the number of the 10 simulated values that exceed 7.4167 is a binomial random 
variable with parameters n = 10 and/) = p-value, it follows that a 90 percent confidence 
interval for the p- value is 



Rvalue 6 .1843 ± 1.645-y/ .1843(.8157)/10 4 
That is, with 90 percent confidence 

Rvalue e (.1779, .1907) ■ 

11.3 GOODNESS OF FIT TESTS WHEN SOME 
PARAMETERS ARE UNSPECIFIED 

We can also perform goodness of fit tests of a null hypothesis that does not completely 
specify the probabilities {pi, i = 1, . . . , k). For instance, consider the situation previously 
mentioned in which one is interested in testing whether the number of accidents occurring 
daily in a certain industrial plant is Poisson distributed with some unknown mean X. To 
test this hypothesis, suppose that the daily number of accidents is recorded for n days — let 
Y\ , . . . , Y n be these data. To analyze these data we must first address the difficulty that the 
Yi can assume an infinite number of possible values. However, this is easily dealt with by 
breaking up the possible values into a finite number k of regions and then considering the 
region in which each Y t falls. For instance, we might say that the outcome of the number 
of accidents on a given day is in region 1 if there are accidents, region 2 if there is 1 
accident, and region 3 if there are 2 or 3 accidents, region 4 if there are 4 or 5 accidents, 
and region 5 if there are more than 5 accidents. Hence, if the distribution is indeed Poisson 
with mean X, then 



Pi =P[Y = 0} = e~ x (11.3.1) 

e~ x X 2 e~ x X^ 



P 2 =P{Y= \} = Xe- x 



P 5 =P{Y = 2}+P{Y = 3} = 
P 4 = P{Y = 4} + P{Y = 5} = 
P 5 =P{Y > 5}= 1 -e~ x -Xe~ 



2 6 

e~ x X 4 e~ x X 5 



24 120 

e~ x X 2 e~ x X 3 e~ x X 4 e~ x X 5 



24 120 



The second difficulty we face in obtaining a goodness of fit test results from the fact 
that the mean value X is not specified. Clearly, the intuitive thing to do is to assume that 
Hq is true and then estimate it from the data — say, X is the estimate of A — and then 



494 Chapter 1 1 : Goodness of Fit Tests and Categorical Data Analysis 



compute the test statistic 



^ = £ 



(X, 



where Xj is, as before, the number of K that fall in region i,i = 1, . . . , k, and pi is 
the estimated probability of the event that Y> falls in region i, which is determined by 
substituting X for X in expression 11.3.1 for/> 2 . 

In general, this approach can be utilized whenever there are unspecified parameters in 
the null hypothesis that are needed to compute the quantities pi, i = l,...,k. Suppose 
now that there are m such unspecified parameters and that they are to be estimated by the 
method of maximum likelihood. It can then be proven that when n is large, the test statistic 
T will have, when Hq is true, approximately a chi-square distribution with k — 1 — m 
degrees of freedom. (In other words, one degree of freedom is lost for each parameter that 
needs to be estimated.) The test is, therefore, to 

reject H if T > X^_j_ w 
accept Ho otherwise 

An equivalent way of performing the foregoing is to first determine the value of the test 
statistic T, say T = t, and then compute 

Rvalue «a P{Xk-i- m > A 
The hypothesis would be rejected if a > ^-value. 

EXAMPLE 1 1.3a Suppose the weekly number of accidents over a 30-week period is as 
follows: 

80013402 12 5 
18020193 45 
33474012 12 

Test the hypothesis that the number of accidents in a week has a Poisson distribution. 

SOLUTION Since the total number of accidents in the 30 weeks is 95, the maximum 
likelihood estimate of the mean of the Poisson distribution is 

95 
X= — = 3.16667 
30 



Since the estimate of P{Y = i} is then 

P{Y = i] = 



e~ x X' 
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we obtain, after some computation, that with the five regions as given in the beginning of 
this section, 

pi = .04214 
p 2 = .13346 
/>3 = .43434 
p A = .28841 
p 5 = .10164 

Using the data values X\ = 6,X 2 = 5,X$ = 8,Xa = 6,X$ = 5, an additional 
computation yields the test statistic value 

T = > — ^— = 21.99156 

To determine the/>-value, we run Program 5.8.1a. This yields 

Rvalue f» P{xl > 21.99} 
= 1 - .999936 
= .000064 

and so the hypothesis of an underlying Poisson distribution is rejected. (Clearly, 
there were too many weeks having accidents for the hypothesis that the underlying 
distribution is Poisson with mean 3. 167 to be tenable.) ■ 

1 1.4 TESTS OF INDEPENDENCE IN CONTINGENCY 
TABLES 

In this section, we consider problems in which each member of a population can 
be classified according to two distinct characteristics — which we shall denote as the 
^-characteristic and the K-characteristic. We suppose that there are r possible values for 
the X-characteristic and s for the F-characteristic, and let 

P ij= P{X = i,Y=j} 

for i = l,,.,,r,j = 1, . . . ,s. That is, Pij represents the probability that a randomly 
chosen member of the population will have X-characteristic i and ^-characteristic j. 
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The different members of the population will be assumed to be independent. Also, let 



and 



= 1 



qj = p{Y=j} = Y, p ip ; = i.---.' 



That is, pi is the probability that an arbitrary member of the population will have 
X-characteristic z, and qj is the probability it will have K-characteristicy. 

We are interested in testing the hypothesis that a population member's X- and 
F-characteristics are independent. That is, we are interested in testing 



Hq : Pij = piqj, for all i = \,. . . ,r 
j =l,...,s 



against the alternative 



H\ : Pij 7^ piqj, for some i,j i = 1, . . . , r 

j=l,...,s 

To test this hypothesis, suppose that n members of the population have been sampled, with 
the result that Ng of them have simultaneously had X-characteristic i and F-characteristic 
j,i = 1,. . . ,r,j = 1,. . . ,s. 

Since the quantities pi, i = 1, . . . , r, and qj,j = 1, ... ,s are not specified by the null 
hypothesis, they must first be estimated. Now since 

s 
Ni = Y^ N *i> i=h..-,r 

7=1 

represents the number of the sampled population members that have X-characteristic i, 
a natural (in fact, the maximum likelihood) estimator of pi is 

Ni 
pi = — , i = 1,. . . , r 

n 

Similarly, letting 

r 

m j = J2 n >j> ; = !>■■•>* 
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denote the number of sampled members having F-characteristicy, the estimator for qj is 

Mj 

qj = — , j =l,...,s 
n 

At first glance, it may seem that we have had to use the data to estimate r + s parameters. 
However, since the/>/s and gy's have to sum to 1 — that is, Yli=iPi = S/=i 1j = "• — 
we need estimate only r — 1 of the p's and s — 1 of the q's. (For instance, if r were 
equal to 2, then an estimate of p\ would automatically provide an estimate of p% since 
p2 — 1 — pi-) Hence, we actually need estimate r— \+s— \=r + s — 2 parameters, 
and since each population member has k = rs different possible values, it follows that the 
resulting test statistic will, for large n, have approximately a chi-square distribution with 
rs — 1 — (r + s — 2) = (r — l)(s — 1) degrees of freedom. 

Finally, since 

E[Njj\ = nPij 

= npiqj when Hq is true 

it follows that the test statistic is given by 



T = y^ y^ (N;j - npfij) 1 ^*'Ng 



and the approximate significance level a test is to 

reject H if r > xj (r _ 1)( ^ 1} 
not reject Hq otherwise 

EXAMPLE 1 1 .4a A sample of 300 people was randomly chosen, and the sampled individ- 
uals were classified as to their gender and political affiliation, Democrat, Republican, or 
Independent. The following table, called a contingency table, displays the resulting data. 







j 






i 


Democrat 


Republican 


Independent 


Total 


Women 

Men 

Total 


68 

52 
120 


56 

71 

128 


32 

20 

52 


156 
144 
300 



Thus, for instance, the contingency table indicates that the sample of size 300 contained 
68 women who classified themselves as Democrats, 56 women who classified themselves 
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as Republicans, and 32 women who classified themselves as Independents; that is, N\ 1 = 
68,7Vi2 = 56, and7Vi3 = 32. Similarly, N 2 \ = 52,N 22 = 72, and 7V 2 3 = 20. 

Use these data to test the hypothesis that a randomly chosen individual's gender and 
political affiliation are independent. 

SOLUTION From the above data, we obtain that the six values of npi<fj = NiMjIn are as 
follows: 

AWi 156 x 120 

— — = = 62.40 

n 300 

N X M 2 _ 156 x 128 

n 300 

NiM 3 _ 156 x 52 

n ~ 300 

N 2 M X _ 144 x 120 

n ~ 300 

N 2 M 2 _ 144 x 128 

n " 300 

N 2 M 3 _ 144 x 52 

n ~ 300 

The value of the test statistic is thus 



= 66.56 

= 27.04 
= 57.60 
= 61.44 
= 24.96 



_ (68 - 62.40) 2 (56 - 66.56) 2 (32 - 27.04) 2 , (52 - 57.60) : 

* *5 — ~ ~ "r ~i~z. \ 



62.40 66.56 27.04 57.60 

(72-61.44) 2 (20-24.96) 2 
61.44 + 24.96 
= 6.433 

Since (r — l)(s — 1) = 2, we must compare the value of TS with the critical value x 05 2 - 
From Table A2 

X.05,2 = 5-991 

Since TS > 5. 991, the null hypothesis is rejected at the 5 percent level of significance. 
That is, the hypothesis that gender and political affiliation of members of the population 
are independent is rejected at the 5 percent level of significance. ■ 

The results of the test of independence of the characteristics of a randomly chosen 
member of the population can also be obtained by computing the resulting Rvalue. If 
the observed value of the test statistic is T = t, then the significance level a test would 
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call for rejecting the hypothesis of independence if the />-value is less than or equal to a, 
where 

/(-value = Ph {T > t} 

Program 1 1.4 will compute the value of T. 

EXAMPLE 1 1 .4b A company operates four machines on three separate shirts daily. The 
following contingency table presents the data during a 6-month time period, concerning 
the machine breakdowns that resulted. 

Number of Breakdowns 







A 


] 


Machine 


D 


Total 






B 


C 


per Shift 


Shift 1 




10 


12 


6 


7 




35 


Shift 2 




10 


24 


9 


10 




53 


Shift 3 




13 


20 


7 


10 




50 


Total per 


Machine 


33 


56 


22 


27 




138 



Suppose we are interested in determining whether a machine's breakdown probability 
during a particular shift is influenced by that shift. In other words, we are interested in 
testing, for an arbitrary breakdown, whether the machine causing the breakdown and the 
shift on which the breakdown occurred are independent. 

SOLUTION A direct computation, or the use of Program 1 1 .4, gives that the value of 
the test statistic is 1.8148 (see Figure 11.2). Utilizing Program 5.8.1a then gives that 

/>-value % P{xl > 1.8148} 
= 1 - .0641 
= .9359 

and so the hypothesis that the machine that causes a breakdown is independent of the shift 
on which the breakdown occurs is accepted. ■ 



1 1.5 TESTS OF INDEPENDENCE IN CONTINGENCY 
TABLES HAYING FIXED MARGINAL TOTALS 

In Example 1 1 .4a, we were interested in determining whether gender and political affili- 
ation were dependent in a particular population. To test this hypothesis, we first chose 
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The Test Statistic for Independence in a Contingency Table 



EH 





A 


B 


C 


Id 




1 


10 


12 


6 


I7 


2 


10 


24 


9 


j 10 

JEM 


3 


13 


20 


7 















Start 



Quit 



The test statistic has value t = 1.81 478 



FIGURE 1 1. 2 



a random sample of people from this population and then noted their characteristics. 
However, another way in which we could gather data is to fix in advance the numbers of 
men and women in the sample and then choose random samples of those sizes from the 
subpopulations of men and women. That is, rather than let the numbers of women and 
men in the sample be determined by chance, we might decide these numbers in advance. 
Because doing so would result in fixed specified values for the total numbers of men and 
women in the sample, the resulting contingency table is often said to have fixed margins 
(since the totals are given in the margins of the table) . 

It turns out that even when the data are collected in the manner prescribed above, the 
same hypothesis test as given in Section 11.4 can still be used to test for the independence 
of the two characteristics. The test statistic remains 



re=E£ 



(Nit 



/her 



Nij = number of members of sample who have both X-characteristic i 

and ^-characteristic^' 
Ni = number of members of sample who have X-characteristic i 
Mj = number of members of sample who have ^-characteristic^ 



11.5 Tests of Independence in Contingency Tables Having Fixed Marginal Totals 501 



A NiMj 



where n is the total size of the sample. 

In addition, it is still true that when Hq is true, TS will approximately have a chi-square 
distribution with (r — l)(r — 1) degrees of freedom. (The quantities r and s refer, of 
course, to the numbers of possible values of the X- and ^-characteristic, respectively.) In 
other words, the test of the independence hypothesis is unaffected by whether the marginal 
totals of one characteristic are fixed in advance or result from a random sample of the entire 
population. 

EXAMPLE 1 1.5a A randomly chosen group of 20,000 nonsmokers and one of 10,000 
smokers were followed over a 10-year period. The following data relate the numbers 
of them that developed lung cancer during that period. 



Smokers Nonsmokers Total 

Lung cancer 62 14 76 

No lung cancer 9,938 19,986 29,924 

Total 10,000 20,000 30,000 



Test the hypothesis that smoking and lung cancer are independent. Use the 1 percent level 
of significance. 

SOLUTION The estimates of the expected number to fall in each ij cell when smoking and 
lung cancer are independent are 



(76) (10,000) 

e\\ = = 25.33 

30,000 

(76X20,000) 

e\2 = = 50.67 

30,000 

(29,924)(10,000) 

e%\ = = 9,974.67 

30,000 

(29,924)(20,000) 100/033 

e 2 2 = = 19,949.33 

30,000 
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Therefore, the value of the test statistic is 

(62-25.33) 2 (14-50.67) 2 (9,938 - 9,974.67) 2 



TS 



+ 



25.33 50.67 9,974.67 

(19,986 - 19,949.33) 2 



19,949.33 
= 53.09 + 26.54 + .13 + .07 = 79.83 

Since this is far larger than x\\ \ = 6.635, we reject the null hypothesis that whether 
a randomly chosen person develops lung cancer is independent of whether that person is 
a smoker. ■ 

We now show how to use the framework of this section to test the hypothesis that 
m discrete population distributions are equal. Consider m separate populations, each of 
whose members takes on one of the values 1, . . . , n. Suppose that a randomly chosen 
member of population i will have value j with probability 

pi,j, i — l,...,m, j — l,...,n 

and consider a test of the null hypothesis 

Hq : p\,j = p2,j = p3,j = ■ • • = p m j, for each; = 1, . . . , n 

To obtain a test of this null hypothesis, consider first the superpopulation consisting 
of all members of each of the m populations. Any member of this superpopulation can 
be classified according to two characteristics. The first characteristic specifies which of 
the m populations the member is from, and the second characteristic specifies its value. 
The hypothesis that the population distributions are equal becomes the hypothesis that, 
for each value, the proportion of members of each population having that value are the 
same. But this is exactly the same as saying that the two characteristics of a randomly 
chosen member of the superpopulation are independent. (That is, the value of a randomly 
chosen superpopulation member is independent of the population to which this member 
belongs.) 

Therefore, we can test Hq by randomly choosing sample members from each 
population. If we let Mj denote the sample size from population i and let JVJy 
denote the number of values from that sample that are equal to j, i = 1, . . . , m, j — 
1, . . . , n, then we can test Hq by testing for independence in the following contingency 
table. 
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Population 



Value 1 2 .. /'. m Totals 

1 M,l Ni,i... Nu... AW M 

2 



j Ny N 2 j... Nij... N m>j Nj 

n N x , n N 2 , n ... Ni,„ N m>n N„ 

Totals M\ M 2 ... Mi... M m 

Note that Nj denotes the number of sampled members that have value j. 

EXAMPLE 1 1 .5b A recent study reported that 500 female office workers were randomly 
chosen and questioned in each of four different countries. One of the questions related to 
whether these women often received verbal or sexual abuse on the job. The following data 
resulted. 



Country Number Reporting Abuse 

Australia 28 

Germany 30 

Japan 5 1 

United States 55 



Based on these data, is it plausible that the proportions of female office workers who 
often feel abused at work are the same for these countries? 

SOLUTION Putting the above data in the form of a contingency table gives the following. 







Country 








1 


2 


3 


4 


Totals 


Receive abuse 

Do not receive abuse 

Totals 


28 
472 
500 


30 
470 
500 


58 
442 
500 


55 
445 
500 


171 
1,829 
2,000 



We can now test the null hypothesis by testing for independence in the preceding contin- 
gency table. If we run Program 1 1.4, then the value of the test statistic and the resulting 
/>-value are 

73=19.51, Rvalue « .0002 
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Therefore, the hypothesis that the percentages of women who feel they are being abused 
on the job are the same for these countries is rejected at the 1 percent level of significance 
(and, indeed, at any significance level above .02 percent). ■ 

*l 1.6 THE KOLMOGOROV-SMIRNOV GOODNESS OF 
FIT TEST FOR CONTINUOUS DATA 

Suppose now that Y\, . . . , Y„ represents sample data from a continuous distribution, and 
suppose that we wish to test the null hypothesis Hq that F is the population distribution, 
where F is a specified continuous distribution function. One approach to testing Hq is to 
break up the set of possible values of the Yj into k distinct intervals, say, 

(70,71), (71,72), • • • , (yk-l>yk)> where 70 = -00, y k = +00 

and then consider the discretized random variables Y ' ,j = 1, . . . , n, defined by 

Y: —i if >y lies in the interval (y,—i, yd 

The null hypothesis then implies that 

P{Yf = i] = F{ yi ) - Fiyi-x), i = l,...,A 

and this can be tested by the chi-square goodness of fit test already presented. 

There is, however, another way of testing that the Yj come from the continuous dis- 
tribution function F that is generally more efficient than discretizing; it works as follows. 
After observing Y\, . . . , Y„, let F e be the empirical distribution function defined by 

F e {x) = 

n 

That is, F e (x) is the proportion of the observed values that are less than or equal to x. 
Because F e (x) is a natural estimator of the probability that an observation is less than or 
equal to x, it follows that, if the null hypothesis that F is the underlying distribution is 
correct, it should be close to F(x). Since this is so for all x, a natural quantity on which to 
base a test of Hq is the test quantity 

D = Maximum \F e (x) - F(x)\ 

X 

where the maximum is over all values of x from —00 to +00. The quantity D is called the 
Kolmogorov—Smirnov test statistic. 



* Optional section. 



' 1 1 .6 The Kolmogorov— Smirnov Goodness of Fit Test for Continuous Data 
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To compute the value of/5 for a given data set Yj = yj,j = l,...,n, \ety{\),y{2), ■ ■ ■ ,y( n ) 
denote the values of the ys in increasing order. That is, 

_)/(,) = y'th smallest ofyi, . . . ,y„ 

For example, if n = 3 and_yi = 3, 72 = 5,^3 = 1, thenjj/(i) = l,j(2) = 3, 7(3) = 5. Since 
F e {x) can be written 



FAx) 



ifx<j (1) 

1 -r 

- ir^d) < x <7( 2 ) 



if 70) < * < 7(7+D 



1 if y(n) < 



we see that F e (x) is constant within the intervals (j/(y_i),j(y)) and then jumps by 1/w at 
the points j/(i), . . . ,J(»). Since F(x) is an increasing function of x that is bounded by 1, it 
follows that the maximum value of F e {x) — F{x) is nonnegative and occurs at one of the 
points y{f),j = 1, • • • , n (see Figure 1 1.3). 
That is, 



MaximumJivCx) — F(x)} = Maximum I F(y(j)) 



(11.6.1) 




FIGURE 11.3 n = 5. 



506 



Chapter 1 1 : Goodness of Fit Tests and Categorical Data Analysis 



Similarly, the maximum value of F(x) — F e (x) is also nonnegative and occurs immediately 
before one of the jump points yyy, and so 



(11.6.2) 



Maximumji^x) — F e {x)} = Maximum \F{yij\) 

x j=\,...,n ' 



From Equations 1 1.6.1 and 1 1.6.2, we see that 
D = Maximum \F e {x) — F(x)\ 

X 

= MaximumjMaximumfi^ (x) — F(x)}, Maximum {F(x) — F e {x)}} 

\J 17/.. N 17/.. X j~ ! 



= Maximum { F(yr ;\) ,F(yr ,■)) 

n n 



.7 = 1- 



, » 



(11.6.3) 



Equation 1 1.6.3 can be used to compute the value of D. 

Suppose now that the Yj are observed and their values are such that D = d. Since a 
large value of D would appear to be inconsistent with the null hypothesis that F is the 
underlying distribution, it follows that the />-value for this data set is given by 

/(-value = Pf{D > d) 

where we have written Pp to make explicit that this probability is to be computed under 
the assumption that Hq is correct (and so F is the underlying distribution). 

The above /<-value can be approximated by a simulation that is made easier by the 
following proposition, which shows that Pp{D > d\ does not depend on the underlying 
distribution F . This result enables us to estimate the />-value by doing the simulation 
with any continuous distribution F we choose [thus allowing us to use the uniform (0, 1) 
distribution] . 

PROPOSITION 1 1.6.1 

Pf{D > d) is the same for any continuous distribution F . 



Proof 



Pf{D > d] = Pf \ Maximum 



#i : Yi < x 



= Pp \ Maximum 
I 



n 
#i : F{Yi) < F{x) 



Fix) 



-p\ 



Maximum 

| x 



#i : Ui < Fix) 



Fix) 



>d 
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where U\,...,U n are independent uniform (0, 1) random variables: the first equality 
following because F is an increasing function and so Y < x is equivalent to F(Y) < F(x); 
and the second because of the result (whose proof is left as an exercise) that if Y has the 
continuous distribution F then the random variable F{Y) is uniform on (0, 1). 

Continuing the above, we see by letting y = F(x) and noting that as x ranges from 
— oo to +oo, F{x) ranges from to 1, that 



#i : U,<y 



Pp{D>d}=P\ Maximum 

I ^<y< 1 



which shows that the distribution of D, when Hq is true, does not depend on the actual 
distribution F. ■ 

It follows from the above proposition that after the value of D is determined from the 
data, say, D = d, the Rvalue can be obtained by doing a simulation with the uniform 
(0, 1) distribution. That is, we generate a set of n random numbers U\,...,U„ and then 
check whether or not the inequality 



Maximum 

0<y<l 



#i : Ui < 



y 



-y 



> d 



is valid. This is then repeated many times and the proportion of times that it is valid is 
our estimate of the />-value of the data set. As noted earlier, the left side of the inequality 
can be computed by ordering the random numbers and then using the identity 



Max 



#i :Ui<y 



Ma 



(j ~ 1) 

U (j)> U (j) — >J = ! " 



, n 



where Uu) is they'th smallest value of U\, . . . , U„. For example, if n = 3 and U\ = .7, Ui = 
.6, t/3 = .4, then U(\) = A, Uq) = .6, f/y) = .7 and the value of D for this data set is 



D = Max 



|-.6,l-.7,.4,.6-I,.7-| 



A significance level a test can be obtained by considering the quantity D* defined by 

D* = {JTi + .12 + All Jn)D 

Letting d* be such that 

P F {D*>d*}=a 
then the following are accurate approximations for d* for a variety of values: 



d\ 



1.224, 



d* 5 = 1.358, 



d* 25 = 1.480, 



d 



01 



1.626 
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The level a test would reject the null hypothesis that F is the distribution if the observed 
value of D* is at least as large as d*. 

EXAMPLE 1 1.6a Suppose we want to test the hypothesis that a given population distribu- 
tion is exponential with mean 100; that is, F{x) = 1 — e . If the (ordered) values 
from a sample of size 10 from this distribution are 

66, 72, 81, 94, 1 12, 1 16, 124, 140, 145, 155 

what conclusion can be drawn? 

SOLUTION To answer the above, we first employ Equation 1 1.6.3 to compute the value of 
the Kolmogorov-Smirnov test quantity D. After some computation this gives the result 
D = .4831487, which results in 



D* = .48315(V10 + 0.12 + 0.11/V10) = 1.603 

Because this exceeds d* 02 ^ = 1.480, it follows that the null hypothesis that the data come 
from an exponential distribution with mean 100 would be rejected at the 2.5 percent level 
of significance. (On the other hand, it would not be rejected at the 1 percent level of 
significance.) ■ 



Problems 

1. According to the Mendelian theory of genetics, a certain garden pea plant should 
produce either white, pink, or red flowers, with respective probabilities 4, 2> 4- 
To test this theory, a sample of 564 peas was studied with the result that 141 
produced white, 291 produced pink, and 132 produced red flowers. Using the 
chi-square approximation, what conclusion would be drawn at the 5 percent level 
of significance? 

2. To ascertain whether a certain die was fair, 1,000 rolls of the die were recorded, 
with the following results. 



Outcome Number of Occurrences 

1 158 

2 172 

3 164 

4 181 

5 160 

6 165 
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Test the hypothesis that the die is fair (that is, that/>/ = g, i ' = 1, . . . , 6) at the 
5 percent level of significance. Use the chi-square approximation. 

3. Determine the birth and death dates of 100 famous individuals and, using the 
four-category approach of Example 1 1 .2a, test the hypothesis that the death month 
is not affected by the birth month. Use the chi-square approximation. 

4. It is believed that the daily number of electrical power failures in a certain 
Midwestern city is a Poisson random variable with mean 4.2. Test this hypothesis 
if over 150 days the number of days having i power failures is as follows: 



Failures 


Number of Days 








1 


5 


2 


22 


3 


23 


4 


32 


5 


22 


6 


19 


7 


13 


8 


6 


9 


4 


10 


4 


11 






5. Among 100 vacuum tubes tested, 41 had lifetimes of less than 30 hours, 31 had 
lifetimes between 30 and 60 hours, 13 had lifetimes between 60 and 90 hours, 
and 1 5 had lifetimes of greater than 90 hours. Are these data consistent with the 
hypothesis that a vacuum tube's lifetime is exponentially distributed with a mean 
of 50 hours? 

6. The past output of a machine indicates that each unit it produces will be 

top grade with probability .40 

high grade with probability .30 

medium grade with probability .20 

low grade with probability .10 

A new machine, designed to perform the same job, has produced 500 units with 
the following results. 
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top grade 
high grade 
medium grade 
low grade 



234 

117 

81 

68 



Can the difference in output be ascribed solely to chance? 

7. The neutrino radiation from outer space was observed during several days. The 
frequencies of signals were recorded for each sidereal hour and are as given 
below: 

Frequency of Neutrino Radiation from Outer Space 



Hour 


Frequency 


Hour 


Frequency 


Starting at 


of Signals 


Starting at 


of Signals 





24 


12 


29 


1 


24 


13 


26 


2 


36 


14 


38 


3 


32 


15 


26 


4 


33 


16 


37 


5 


36 


17 


28 


6 


41 


18 


43 


7 


24 


19 


30 


8 


37 


20 


40 


9 


37 


21 


22 


10 


49 


22 


30 


11 


51 


23 


42 



Test whether the signals are uniformly distributed over the 24-hour period. 

8. Neutrino radiation was observed over a certain period and the number of hours in 
which 0, 1,2, ... signals were received was recorded. 



Number of Number of Hours with 

Signals per Hour This Frequency of Signals 




1 
2 
3 
4 
5 
6 or more 



1,924 
541 
103 

17 
1 
1 




Test the hypothesis that the observations come from a population having a Poisson 
distribution with mean .3. 
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9. In a certain region, insurance data indicate that 82 percent of drivers have no 
accidents in a year, 1 5 percent have exactly 1 accident, and 3 percent have 2 or 
more accidents. In a random sample of 440 engineers, 366 had no accidents, 68 
had exactly 1 accident, and 6 had 2 or more. Can you conclude that engineers 
follow an accident profile that is different from the rest of the drivers in the region? 

10. Astudy was instigated to see if southern California earthquakes of at least moderate 
size (having values of at least A A on the Richter scale) are more likely to occur on 
certain days of the week than on others. The catalogs yielded the following data 
on 1,100 earthquakes. 



Day 


Sun 


Mon 


Tues 


Wed 


Thurs 


Fri 


Sat 


Number of Earthquakes 


156 


144 


170 


158 


172 


148 


152 



Test, at the 5 percent level, the hypothesis that an earthquake is equally likely to 
occur on any of the 7 days of the week. 

11. Sometimes reported data fit a model so well that it makes one suspicious that the 
data are not being accurately reported. For instance, a friend of mine has reported 
that he tossed a fair coin 40,000 times and obtained 20,004 heads and 19,996 
tails. Is such a result believable? Explain your reasoning. 

12. Use simulation to determine the ^-value and compare it with the result you 
obtained using the chi-square approximation in Problem 1. Let the number of 
simulation runs be 

(a) 1,000; 

(b) 5,000; 

(c) 10,000. 

13. A sample of size 120 had a sample mean of 100 and a sample standard deviation 
of 15. Of these 120 data values, 3 were less than 70; 18 were between 70 and 85; 
30 were between 85 and 100; 35 were between 100 and 115; 32 were between 
115 and 130; and 2 were greater than 130. Test the hypothesis that the sample 
distribution was normal. 

14. In Problem 4, test the hypothesis that the daily number of failures has a Poisson 
distribution. 

15. A random sample of 500 families was classified by region and income (in units of 
$1,000). The following data resulted. 



Income 


South 


North 


0-10 


42 


53 


10-20 


55 


90 


20-30 


47 


88 


>30 


36 


89 
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Determine the />-value of the test that a family's income and region are 
independent. 

16. The following data relate the mother's age and the birthweight (in grams) of her 
child. 





Birthweight 


Maternal Age 


Less Than 2,500 Grams More Than 2,500 Grams 


20 years or less 
Greater than 20 


10 40 
15 135 



Test the hypothesis that the baby's birthweight is independent of the mother's age. 

17. Repeat Problem 16 with all of the data values doubled — that is, with these data: 

20 80 
30 270 

18. The number of infant mortalities as a function of the baby's birthweight (in grams) 
for 72,730 live white births in New York in 1974 is as follows: 





Outcome at the End of 1 Year 


Birthweight 


Alive Dead 


Less than 2,500 
Greater than 2,500 


4,597 618 
67,093 422 



Test the hypothesis that the birthweight is independent of whether or not the baby 
survives its first year. 

19. An experiment designed to study the relationship between hypertension and 
cigarette smoking yielded the following data. 

Nonsmoker Moderate Smoker Heavy Smoker 

Hypertension 20 38 28 

No hypertension 50 27 18 

Test the hypothesis that whether or not an individual has hypertension is 
independent of how much that person smokes. 

20. The following table shows the number of defective, acceptable, and superior items 
in samples taken both before and after the introduction of a modification in the 
manufacturing process. 
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Defective Acceptable Superior 



Before 25 218 22 

After 9 103 14 



Is this change significant at the .05 level? 

21. A sample of 300 cars having cellular phones and one of 400 cars without phones 
were tracked for 1 year. The following table gives the number of these cars involved 
in accidents over that year. 



Accident No Accident 



Cellular phone 22 278 

No phone 26 374 

Use the above to test the hypothesis that having a cellular phone in your car 
and being involved in an accident are independent. Use the 5 percent level of 
significance. 

22. To study the effect of fluoridated water supplies on tooth decay, two communities 
of roughly the same socioeconomic status were chosen. One of these communities 
had fluoridated water while the other did not. Random samples of 200 teenagers 
from both communities were chosen, and the numbers of cavities they had were 
determined. The following data resulted. 

Cavities Fluoridated Town Nonfluoridated Town 

133 

18 
21 
28 

Do these data establish, at the 5 percent level of significance, that the number 
of dental cavities a person has is not independent of whether that person's water 
supply is fluoridated? What about at the 1 percent level? 

23. To determine if a malpractice lawsuit is more likely to follow certain types of 
surgery, random samples of three different types of surgeries were studied, and the 
following data resulted. 

Type of Operation Number Sampled Number Leading to a Lawsuit 

Heart surgery 400 1 6 

Brain surgery 300 19 

Appendectomy 300 7 






154 


1 


20 


2 


14 


3 or more 


12 
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Test the hypothesis that the percentages of the surgical operations that lead to 
lawsuits are the same for each of the three types. 

(a) Use the 5 percent level of significance. 

(b) Use the 1 percent level of significance. 

24. In a famous article (S. Russell, "A red sky at night...," Metropolitan Magazine 
London, 61, p. 15, 1926) the following data set of frequencies of sunset colors and 
whether each was followed by rain was presented. 

Sky Color Number of Observations Number Followed by Rain 

Red 61 26 

Mainly red 194 52 

Yellow 159 81 

Mainly yellow 188 86 

Red and yellow 194 52 

Gray 302 167 

Test the hypothesis that whether it rains tomorrow is independent of the color of 
today's sunset. 

25. Data are said to be from a lognormal distribution with parameters /x and a if the 
natural logarithms of the data are normally distributed with mean fi and standard 
deviation a . Use the Kolmogorov-Smirnov test with significance level .05 to 
decide whether the following lifetimes (in days) of a sample of cancer-bearing 
mice that have been treated with a certain cancer therapy might come from a 
lognormal distribution with parameters fi = 3 and a = 4. 

24, 12, 36, 40, 16, 10, 12, 30, 38, 14, 22, 18 
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NONPARAMETRIC HYPOTHESIS TESTS 



12.1 INTRODUCTION 

In this chapter, we shall develop some hypothesis tests in situations where the data come 
from a probability distribution whose underlying form is not specified. That is, it will not 
be assumed that the underlying distribution is normal, or exponential, or any other given 
type. Because no particular parametric form for the underlying distribution is assumed, 
such tests are called nonparametric. 

The strength of a nonparametric test resides in the fact that it can be applied without any 
assumption on the form of the underlying distribution. Of course, if there is justification 
for assuming a particular parametric form, such as the normal, then the relevant parametric 
test should be employed. 

In Section 12.2, we consider hypotheses concerning the median of a continuous dis- 
tribution and show how the sign test can be used in their study. In Section 12.3, we 
consider the signed rank test, which is used to test the hypothesis that a continuous popu- 
lation distribution is symmetric about a specified value. In Section 12.4, we consider the 
two-sample problem, where one wants to use data from two separate continuous distribu- 
tions to test the hypothesis that the distributions are equal, and present the rank sum test. 
Finally, in Section 12.5 we study the runs test, which can be used to test the hypothesis that 
a sequence of O's and l's constitutes a random sequence that does not follow any specified 
pattern. 

12.2 THE SIGN TEST 

Let X\,...,X„ denote a sample from a continuous distribution F and suppose that we 
are interested in testing the hypothesis that the median of F , call it m, is equal to a 
specified value mo- That is, consider a test of 

Ho : m = mo versus H\ : m 7^ mo 

where m is such that F(m) = .5. 
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This hypothesis can easily be tested by noting that each of the observations will, 
independently, be less than mo with probability F(mo). Hence, if we let 

T _ 1 1 if Xi < m 
' ~ (0 ifXi > mo 

then I\, . . . ,I„ are independent Bernoulli random variables with parameter/) = F(mo); 
and so the null hypothesis is equivalent to stating that this Bernoulli parameter is equal to 
2- Now, if v is the observed value of 5Z;=i h — mat ' s > ^ v ls me number of data values 
less than mo — then it follows from the results of Section 8.6 that the Rvalue of the test 
that this Bernoulli parameter is equal to ^ ls 

Rvalue = 2min(7 5 {Bin(«, 1/2) < v},P{Wm(n, 1/2) > v}) (12.2.1) 

where Bin(«,/>) is a binomial random variable with parameters n and/>. 
However, 

P{Q'm{n, p) > v} = P{n — ¥>'m{n,p) < n — v} 

= P{B'm(n, 1 — p) < n — v] (why?) 

and so we see from Equation 12.2.1 that the/>-value is given by 

Rvalue = 2mm(P{Bm(n, 1/2) < v},P{Bm(n, 1/2) < n - v}) (12.2.2) 

n 
2P{B'm{n, 1/2) < v] if v < - 

n 
2P{Bin(w, 1/2) < n - v\ \iv>- 

Since the value of v = Yll=i k depends on the signs of the terms Xi — mo, the foregoing 
is called the sign test. 

EXAMPLE 12.2a If a sample of size 200 contains 120 values that are less than mo and 80 
values that are greater, what is the />-value of the test of the hypothesis that the median is 
equal to wo? 

SOLUTION From Equation 12.2.2, the /<-value is equal to twice the probability that 
binomial random variable with parameters 200, 5 is less than or equal to 80. 
The text disk shows that 

P{Bin(200, .5) < 80} = .00284 

Therefore, the/>-value is .00568, and so the null hypothesis would be rejected at even the 
1 percent level of significance. I 
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The sign test can also be used in situations analogous to ones in which the paired 
t-test was previously applied. For instance, let us reconsider Example 8.4c, which is inter- 
ested in testing whether or not a recently instituted industrial safety program has had 
an effect on the number of man-hours lost to accidents. For each of 10 plants, the data 
consisted of the pair Xi, Yj, which represented, respectively, the average weekly loss at 
plant i before and after the program. Letting Zi — Xi — Y{,i = 1, . . . , 10, it follows 
that if the program had not had any effect, then Z{, i = 1, . . . , 10, would be a sample 
from a distribution whose median value is 0. Since the resulting values of Zi, — namely, 
7.5, —2.3,2.6, 3.7, 1.5, —.5, —1,4.9,4.8, 1.6 — contain three whose sign is negative and 
seven whose sign is positive, it follows that the hypothesis that the median of Z is should 
be rejected at significance level a if 



E 



10\ / 1 \ a 



< 

i j \ 2 I ~ 2 



Since 



E 

i=0 



10\ /1\ 10 176 



i / \2 1,024 



.172 



it follows that the hypothesis would be accepted at the 5 percent significance level (indeed, 
it would be accepted at all significance levels less than the/>-value equal to .344). 

Thus, the sign test does not enable us to conclude that the safely program has had 
any statistically significant effect, which is in contradiction to the result obtained in 
Example 8.4c when it was assumed that the differences were normally distributed. The 
reason for this disparity is that the assumption of normality allows us to take into account 
not only the number of values greater than (which is all the sign test considers) but also 
the magnitude of these values. (The next test to be considered, while still being nonpara- 
metric, improves on the sign test by taking into account whether those values that most 
differ from the hypothesized median value mo tend to lie on one side of mo — that is, 
whether they tend to be primarily bigger or smaller than mo.) 

We can also use the sign test to test one-sided hypotheses about a population median. 
For instance, suppose that we want to test 

Ho '■ m < mo versus H\ : m > mo 

where m is the population median and mo is some specified value. Let p denote the 
probability that a population value is less than mo, and note that if the null hypothesis is 
true then/) > 1/2, and if the alternative is true then/) < 1/2 (see Figure 12.1). 

To use the sign test to test the preceding hypothesis, choose a random sample of n 
members of the population. If v of them have values that are less than mo, then the 
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area= 1/2 




FIGURE I2.I 



resulting /(-value is the probability that a value of v or smaller would have occurred by 
chance if each element had probability 1/2 of being less than mq. That is, 

Rvalue = P{B'm(n, 1/2) < v] 

EXAMPLE 12.2b A financial institution has decided to open an office in a certain commu- 
nity if it can be established that the median annual income of families in the community is 
greater than $90,000. To obtain information, a random sample of 80 families was chosen, 
and the family incomes determined. If 28 of these families had annual incomes below 
and 52 had annual incomes above $90,000, is this significant enough to establish, say, at 
the 5 percent level of significance, that the median annual income in the community is 
greater than $90,000? 

SOLUTION We need to see if the data are sufficient to enable us to reject the null hypothesis 
when testing 

Hq : m < 90 versus H\ : m > 90 

The preceding is equivalent to testing 

Hq :/> > 1/2 versus H\ : p < 1/2 
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where/> is the probability that a randomly chosen member of the population has an annual 
income of less than $90,000. Therefore, the/>-value is 

Rvalue = 7 5 {Bin(80, 1/2) < 28} = .0048 

and so the null hypothesis that the median income is less than or equal to $90,000 is 
rejected. ■ 

A test of the one-sided null hypothesis that the median is at least mo is obtained 
similarly. If a random sample of size n is chosen, and v of the resulting values are less 
than mo, then the resulting />- value is 

/(-value = P{B'm(n, 1/2) > v) 



12.3 THE SIGNED RANK TEST 

The sign test can be employed to test the hypothesis that the median of a continuous 
distribution F is equal to a specified value mo- However, in many applications one is really 
interested in testing not only that the median is equal to mo but that the distribution is 
symmetric about mo- That is, ifX has distribution function F, then one is often interested 
in testing the hypothesis Hq '■ P{X < mo — a] = P{X > mo + a] for all a > 
(see Figure 12.2). Whereas the sign test could still be employed to test the foregoing 
hypothesis, it suffers in that it compares only the number of data values that are less than 
mo with the number that are greater than mo and does not take into account whether or 
not one of these sets tends to be further away from mo than the other. A nonparametric test 
that does take this into account is the so-called signed rank test. It is described as follows. 
Let Yi = Xi — mo, i = 1, . . . , n and rank (that is, order) the absolute values 
\Y X \, \Y 2 \,...,\Y n \, Set, for; = !,...,». 



h = 



if the y'th smallest value comes from a data value that is smaller 

than mo 

otherwise 



Now, whereas 5Zi=i fy represents the test statistic for the sign test, the signed rank test 
uses the statistic T = X!/=i flj- That is, like the sign test it considers those data values 
that are less than mo, but rather than giving equal weight to each such value it gives larger 
weights to those data values that are farthest away from mo- 

EXAMPLE 12.3a If n - 4, m = 2, and the data values areXi = 4.2, X 2 = 1.8, X 5 = 5.3, 
X4 = 1.7, then the rankings of \Xi — 2| are .2, .3, 2.2, 3.3. Since the first of these 
values — namely, .2 — comes from the data point X 2 , which is less than 2, it follows that 
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FIGURE 12.2 A symmetric density: m = 3. 



/(*) = 



c{0, .4(x - 3) + \/!i} x < 3 

<.{0,-A(x-3) + SA} x>3 



I\ = 1. Similarly, /^ = 1j and ^3 and /j equal 0. Hence, the value of the test statistic is 
T = 1+2 = 3. ■ 

When Hq is true, the mean and variance of the test statistic T are easily computed. 
This is accomplished by noting that, since the distribution of Yj = Xj — mo is symmetric 
about 0, for any given value of | Yj \ — say, \Yj\ = y — it is equally likely that either Yj = y 
or Yj = —y. From this fact it can be seen that under Hq, I\,...,I„ will be independent 
random variables such that 

P{I J = \} = \=P{I ] = Q}, j=\,...,n 

Hence, we can conclude that under Hq, 



E[T]=E 



T.P, 



Ej _ n{n + 1) 
9 _ A 



7=1 



(12.3.1) 
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^/ 2 n(n+l)(2n+l) 

= E 7 4 = J — i\ — (12 - 3 - 2) 

7=1 

where the fact that the variance of the Bernoulli random variable Ij is ^(1 — 5) = 4 
is used. 

It can be shown that for moderately large values of n (n > 25 is often quoted as being 
sufficient) T will, when Hq is true, have approximately a normal distribution with mean 
and variance as given by Equations 12.3.1 and 12.3.2. Although this approximation can be 
used to derive an approximate level a test of Ho (which has been the usual approach until 
the recent advent of fast and cheap computational power), we shall not pursue this approach 
but rather will determine the /.-value for given test data by an explicit computation of the 
relevant probabilities. This is accomplished as follows. 

Suppose we desire a significance level a test of Ho. Since the alternative hypothesis is 
that the median is not equal to mo, a two-sided test is called for. That is, if the observed 
value of T is equal to t, then Ho should be rejected if either 

PH {T<t}<°^ or P Ho {T>t}<^ (12.3.3) 

The /.-value of the test data when T = t is given by 

/.-value = 2mm(P Ho {T < t},P Ho {T > t}) (12.3.4) 

That is, if T = t, the signed rank test calls for rejection of the null hypothesis if the 
significance level a is at least as large as this p- value. The amount of computation necessary 
to compute the /.-value can be reduced by utilizing the following equality (whose proof 
will be given at the end of the section). 

Ph {T > t) = P Ho j T < t 

Using Equation 12.3.4, the/<-value is given by 

/.-value = 2min \P Ho [T < t},P Ho J T < ^ ^ - - A J 
= 2P Ho [T<t*} 
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n(n + 1) 
mm I t, t 



ifher 



It remains to compute Ph {T < t*}. To do so, let Pk(i) denote the probability, under 
Ho, that the signed rank statistic T will be less than or equal to i when the sample size is k. 
We will determine a recursive formula for Pk(i) starting with k = 1. When k = 1, since 
there is only a single data value, which, when Hq is true, is equally likely to be either less 
than or greater than mo, it follows that T is equally likely to be either or 1. Thus 



A(/) = 



i < 

\ i = 

1 z > 1 



(12.3.5) 



Now suppose the sample size is k. To compute Pk(i), we condition on the value of 1% as 
follows: 



PkU) = Ph 



Ph 



E#^' 



T,JIjSi\Ik = l 



,;= 



P Ho {h = 1] 



= 1 



PH \J2j I j^ i \ I k=^ 

7=1 

7=1 

.7=1 
k-l 



PuAh = 0} 



^// {4 = i) 



PuAh = 0} 



= Ph 



E# 



< Z 



Ptf {/*=1} + />/*, 



E^ 



■ PH {Ik = 0} 



where the last equality utilized the independence of I\,. . . ,h-\, and Ik (when Hq is 
true). Now X)/=i flj nas me same distribution as the signed rank statistic of a sample 
of size k — 1, and since 



PH {h=^=PH Q {h = 0}=\ 
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that 



P k {i) = \P k -i{i-k)+\P k - X {i) 



(12.3.6) 



Starting with Equation 12.3.5, the recursion given by Equation 12.3.6 can be successfully 
employed to compute Pj(-), then/ 5 ^-), and so on, stopping when the desired value P n (t*) 
has been obtained. 

EXAMPLE 12.3b For the data of Example 12.3a, 



t* = min I 3, — 3 1=3 



Hence the/>-value is 2P4(3), which is computed as follows: 



^2(0 

Pi(2 
Pi(3 
i> 3 (0 
/> 3 (1 
P 3 (2 
Pl(3 
Pa(0 
P 4 (l 
P 4 (2 
PaO 



+ ^l(l)]=2 
■Pi(2)]=l 

Pi(3)]=l 

+ ft(0)]=£ 

+ PiW] = - 4 

+ Pi(2)]=l 
Pi(3)]=l 
+ P3(0)]=j S 
•^(1)]=B 



16 



16 



- 2 [Pl(-2) 

jlPli-l) 
i[P 1 (0) + 

\[Pl (D + 
£[ft(-3) 
|[/ > 2 (-2) 

£[/**(- 1) 

1^2(0)4- 
2 [^(-4) 

2 [^ (-3) 
i[P 3 (-2)+P 3 (2)] = 
i[P 3 (-l)+^3(3)] = 



since P 2 (-3) = 



Program 12.3 will use the recursion in Equations 12.3.5 and 12.3.6 to compute the 
/(-value of the signed rank test data. The input needed is the sample size n and the value 
of test statistic T. 

EXAMPLE 12.3c Suppose we are interested in determining whether a certain population 
has an underlying probability distribution that is symmetric about 0. If a sample of size 20 
from this population results in a signed rank test statistic of value 142, what conclusion 
can we draw at the 10 percent level of significance? 
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SOLUTION Running Program 12.3 yields that 

/>-value = . 1 77 

Thus the hypothesis that the population distribution is symmetric about is accepted at 
the a = .10 level of significance. I 

We end this section with a proof of the equality 

(n(n + 1) 



To verify the foregoing, note first that 1 — Ij will equal 1 if the y'th smallest value of 
\Y\\,. . . , \Y n \ comes from a data value larger than mo, and it will equal otherwise. 
Hence, if we let 

n 

7=1 

then T will represent the sum of the ranks of the | Yj\ that correspond to data values larger 
than mo- By symmetry, T will have, under Ho, the same distribution as T. Now 

7=1 7=1 

and so 

P{T > t] = P{T > t} since T and T have the same distribution 

P\ n{n+l) -T>t 



nin + 1) 
= P\T < -t 



REMARK ON TIES 

Since we have assumed that the population distribution is continuous, there is no possi- 
bility of ties — that is, with probability 1, all observations will have different values. 
However, since in practice all measurements are quantized, ties are always a distinct 
possibility. If ties do occur, then the weights given to the values less than mo should 
be the average of the different weights they could have had if the values had differed 
slightly. For instance, if mo = and the data values are 2, 4, 7, —5, —7, then the ordered 
absolute values are 2, 4, 5, 7, 7. Since 7 has rank both 4 and 5, the value of the test statistic 
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T is T = 3 + 4.5 = 7.5. The p- value should be computed as when we assumed that all 
values were distinct. (Although technically this is not correct, the discrepancy is usually 
minor.) 



12.4 THE TWO-SAMPLE PROBLEM 

Suppose that one is considering two different methods for producing items having 
measurable characteristics with an interest in determining whether the two methods result 
in statistically identical items. 

To attack this problem let X\, . . . ,X n denote a sample of the measurable values of n 
items produced by method 1, and, similarly, let Y\, . . . , Y m be the corresponding value 
of m items produced by method 2. If we let F and G, both assumed to be continuous, 
denote the distribution functions of the two samples, respectively, then the hypothesis we 
wish to test is Hq : F — G. 

One procedure for testing Hq — which is known by such names as the rank sum test, 
the Mann-Whitney test, or the Wilcoxon test — calls initially for ranking, or ordering, 
the n + m data values X\, . . . ,X„, Y\, . . . , Y m . Since we are assuming that F and G are 
continuous, this ranking will be unique — that is, there will be no ties. Give the smallest 
data value rank 1, the second smallest rank 2, . . . , and the (n + m)th smallest rank n + m. 
Now, for i = 1, . . . , n, let 



Ri = rank of the data value Xi 

The rank sum test utilizes the test statistic T equal to the sum of the ranks from the first 
sample — that is, 



r = J> 



>'=! 



EXAMPLE 12.4a An experiment designed to compare two treatments against corrosion 
yielded the following data in pieces of wire subjected to the two treatments. 



Treatment 1 



Treatment 2 



65.2,67.1,69.4,78.2,74,80.3 



59.4,72.1,68,66.2,58.5 



(The data represent the maximum depth of pits in units of one thousandth of an inch.) 
The ordered values are 58.5, 59.4, 65.2*, 66.2, 67.1*, 68, 69.4*, 72.1, 74*, 78.2*, 80.3* 
with an asterisk noting that the data value was from sample 1. Hence, the value of the test 
statistic i s r = 3 + 5 + 7 + 9+10+ll=45. ■ 
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Suppose that we desire a significance level a test of Hq. If the observed value of T is 
T = t, then Hq should be rejected if either 

Ph {T <t}< a - or P Ho {T >t}<°^ (12.4.1) 

That is, the hypothesis that the two samples are equivalent should be rejected if the sum of 
the ranks from the first sample is either too small or too large to be explained by chance. 
Since for integral t, 

P{T > t} = 1 -P{T <t) 

= 1 ~P{T <t- 1} 

it follows from Equation 12.4.1 that Hq should be rejected if either 

Ph {T <t}< a - or P Ho {T < t - 1} > 1 - | (12.4.2) 

To compute the probabilities in Equation 12.4.2, let P(N ,M,K) denote the prob- 
ability that the sum of the ranks of the first sample will be less than or equal to K 
when the sample sizes are N and M and Hq is true. We will now determine a recur- 
sive formula for P(N ,M,K), which will then allow us to obtain the desired quantities 
P{n,m,t) = Ph q {T < t] and P{n, m, t - 1). 

To compute the probability that the sum of the ranks of the first sample is less than or 
equal to K when N and M are the sample sizes and Hq is true, let us condition on whether 
the largest of the N + M data values belongs to the first or second sample. If it belongs to 
the first sample, then the sum of the ranks of this sample is equal to N + M plus the sum 
of the ranks of the other N — 1 values from the first sample. Hence this sum will be less 
than or equal to K if the sum of the ranks of the other N — 1 values is less than or equal 
to K — (N + M). But since the remaining N — 1 + M — that is, all but the largest — 
values all come from the same distribution (when Hq is true), it follows that the sum of 
the ranks of N — 1 of them will be less than or equal to K — (N + M) with probability 
P{N — \,M, K — N — M). By a similar argument we can show that, given that the largest 
value is from the second sample, the sum of the ranks of the first sample will be less than 
or equal to K with probability P(N,M — \,K). Also, since the largest value is equally 
likely to be any of the N + M values X\ , . . . , Xjy, Y\ , . . . , Yjy > it follows that it will come 
from the first sample with probability N/(N +M). Putting these together, we thus obtain 
that 

TV 
P{N,M,K) = P(N - l,M,K -N -M) 

N + M 
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Starting with the boundary condition 

Equation 12.4.3 can be solved recursively to obtain P(n, m,t — 1) and P(n, m, t). 

EXAMPLE 12.4b Suppose we wanted to determine P(2, 1, 3). We use Equation 12.4.3 as 
follows: 



and 



Hence, 



7>(2, 1,3) = §/>(1, 1,0) + ^P(2,0,3) 

P{\, 1, 0) = ±/>(0, 1, -2) + \P{\, 0, 0) = 
J P(2,0,3)=/ , (1,0,1) 

= P(0, 0, 0) = 1 

/>(2,l,3) = i 



3 

which checks since in order for the sum of the ranks of the two X values to be less than 
or equal to 3, the largest of the values X\,Xj, Y\, must be Y\, which, when Hq is true, has 
probability ?. ■ 

Since the rank sum test calls for rejection when either 



2P(n, m,t) < a or a > 2[1 — P(n, m,t — 1)] 

it follows that the p- value of the test statistic when T = t is 

/>-value = 2 mm{P(n, m, t), 1 — P(n, m,t — 1)} 

Program 12.4 uses the recursion in Equation 12.4.3 to compute the />-value for the 
rank sum test. The input needed is the sizes of the first and second samples and the sum 
of the ranks of the elements of the first sample. Whereas either sample can be designated 
as the first sample, the program will run fastest if the first sample is the one whose sum of 
ranks is smallest. 

EXAMPLE 12.4c In Example 12.4a, the sizes of the two samples are 5 and 6, respectively, 
and the sum of the ranks of the first sample is 21. Running Program 12.4 yields the 
result: 

^-value = .1255 ■ 
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The p-value in the Two-sample Rank Sum Test i | 


This program computes the p-value for the two samp 


e rank sum test. 






Start 




Enter the size of sample 1 : 9 








Enter the size of sample 2: 13 






Quit 




Enter the sum of the ranks 72 


of the first sample: 






The p-value is 0.03642 



FIGURE I2.3 



EXAMPLE I2.4d Suppose that in testing whether 2 production methods yield identical 
results, 9 items are produced using the first method and 13 using the second. If, among all 
22 items, the sum of the ranks of the 9 items produced by method 1 is 72, what conclusions 
would you draw? 

SOLUTION Run Program 12.4 to obtain the result shown in Figure 12.3. Thus, the hypo- 
thesis of identical results would be rejected at the 5 percent level of significance. ■ 

It remains to compute the value of the test statistic T . It is quite efficient to compute 
T directly by first using a standard computer science algorithm (such as quicksort) to sort, 
or order, the n + m values. Another approach, easily programmed, although efficient for 
only small values of n and m, uses the following identity. 



PROPOSITION 12.4. 1 For i = 1, . 



;j = 1,. 



, m 



let 



W, 



\iXi > Yj 
otherwise 



Then 



T = 



nyn ■ 



1) 



EE^ 



Proof 



Consider the values X\, . . . ,X„ of the first sample and order them. Let X^ denote the 
ith smallest, i = !,...,». Now consider the rank of X(;) among all n + m data values. 
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This is given by 

rank of Xq) = i + number;': Yj < X^ 
Summing over i gives 

n n n 

V rank ofX (/ ) = V i ' + V (number;: Jy < X (j) ) (12.4.4) 

Z*=l Z=l 1=1 

But since the order in which we add terms does not change the sum obtained, we see that 

n n 

^2 rank of X (l) = ^ rank of X; = T (12.4.5) 

i=\ i=l 

n n 

> (number; : Yj < X(/)) = /^ (number; : Jy < X;) 

i=\ i=\ 

Hence, from Equations 12.4.4 and 12.4.5, we obtain that 



n n 



t = ^2 i + XI ( number ; : Y j < x i) 

i= 1 /'= 

n{n + 1) 



=1 i=\ 

n m 






12.4.1 The Classical Approximation and Simulation 

The difficulty with employing the recursion in Equation 12.4.3 to compute the/>-value 
of the two-sample sum of rank test statistic is that the amount of computation grows 
enormously as the sample sizes increase. For instance, if n = m = 200, then even if we 
choose the test statistic to be the smaller sum of ranks, since the sum of all the ranks is 
1 + 2 + • • • + 400 = 80, 200, it is possible that the test statistic could have a value as large 
as 40,100. Hence, there can be as many as 1.604 x 10 9 values of P{N,M,K) that would 
have to be computed to determine the/>-value. Thus, for large sample sizes the approach 
based on the recursion in Equation 12.4.3 is not viable. Two approximate methods that can 
be utilized in such cases are (a) a classical method based on approximating the distribution 
of the test statistic and (b) simulation. 

(a) The Classical Approximation When the null hypothesis is true and so F = G, it 
follows that all n + m data values come from the same distribution and thus all 
(n + m)\ possible rankings of the values X\, . . . ,X n , Y\,...,Y m are equally likely. 
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From this it follows that choosing the n rankings of the first sample is probabilis- 
tically equivalent to randomly choosing n of the (possible rank) values 1,2, ... , 
n + m. Using this, it can be shown that T has a mean and variance given by 



B Ha m 



Var^CT) 



n{n + m + 1) 



nm{n + m + 1) 

12 



In addition, it can be shown that when both n and m are of moderate size (both 
being greater than 7 should suffice) T has, under Hq, approximately a normal 
distribution. Hence, when Hq is true 



T 



n(n + m + 1) 



nm(n + m+ \) 



'AA(CU) 



(12.4.6) 



12 



If we let d denote the absolute value of the difference between the observed 
value of T and its mean value given above, then based on Equation 12.4.6 the 
approximate />-value is 



Rvalue = P Ho { | T - E ffo [T~\ | > d] 



P \\Z\ > dL 



= 2P\Z> dl 



nm(n + m + 1) 
12 



where Z ~ AA(0, 1) 



nm{n + m + 1) 
12 



EXAMPLE I2.4e In Example 12.4a, n = 5,m = 6, and the test statistic's value is 21. Since 

«(« + m+ I) 



nm(n + m + 1) 
12 



30 



= 30 



we have that d = 9 and so 



/(-value « 2T 5 1 Z > 



-v/30. 
= 27>{Z > 1.643108} 
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= 2(1 -.9498) 
= .1004 

which can be compared with the exact value, as given in Example 12.4c, of .1225. 
In Example 12. 4d, n = 9,m = 13, and so 

n(n + m + 1) 

- = 103.5 



2 

nm{n + m + 1) 
12 



224.25 



Since 7* = 72, we have that 



d=\72- 103.51 =31.5 
Thus, the approximate />-value is 

I ~Tp| 7 3L5 

/>-value ^ 2P iZ > 



V224.25 
= 2P{Z> 2.103509} 
= 2(1 - .9823) = .0354 

which is quite close to the exact p-value (as given in Example 12. 4d) of .0364. 

Thus, in the two examples considered, the normal approximation worked quite well in 
the second example — where the guideline that both sample sizes should exceed 7 held — 
and not so well in the first example — where the guideline did not hold. ■ 

(b) Simulation If the observed value of the test statistic is T = t, then the/>-value is 
given by 

/.-value = 2min {P Ho {T > t},P Ho {T < t}} 

We can approximate this value by continually simulating a random selection of n 
of the values 1,2, ...,«+ m — noting on each occasion the sum of the n values. 
The value o£Ph { T > t) can be approximated by the proportion of time that the 
sum obtained is greater than or equal to t, and Ph {T < t) by the proportion of 
time that it is less than or equal to t. 

A Chapter 12 text disk program approximates the />-value by performing the 
preceding simulation. The program will run most efficiently when the sample 
of smallest size is designated as the first sample. 



532 



Chapter 12: Nonparametric Hypothesis Tests 



- 



This program approximates the p-value for the two sample rank sum test 
by a simulation study. 



Enter the size of sample 1 : \5 



Start 



Enter the size of sample 2: \Q 



Enter the sum of the ranks [21_ 
of the first sample: 



Quit 



Enter the desired number |1 0000 
of simulation runs: 



The p-value is 0.125 



FIGURE 1 2.4 





This program approximates the p-value for the two sample rank sum test 
by a simulation study. 






Start 




Enter the size of sample 1 : 9 








Enter the size of sample 2: 13 






Quit 




Enter the sum of the ranks 72 


of the first sample: 






Enter the desired number 1 0000 
of simulation runs: 




The p-value is 0.0356 



FIGURE I2.5 



EXAMPLE 1 2.4f Running the text disk program on the data of Example 12.4c yields 
Figure 12.4, which is quite close to the exact value of .1225. Running the program using 
the data of Example 12.4d yields Figure 12.5, which is again quite close to the exact value 
of .0364. ■ 

Both of the approximation methods work quite well. The normal approximation, when 
n and m both exceed 7, is usually quite accurate and requires almost no computational time. 
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The simulation approach, on the other hand, can require a great deal of computational 
time. However, if an immediate answer is not required and great accuracy is desired, then 
simulation, by running a large number of cases, can be made accurate to an arbitrarily 
prescribed precision. 

12.5 THE RUNS TEST FOR RANDOMNESS 

A basic assumption in much of statistics is that a set of data constitutes a random sample 
from some population. However, it is sometimes the case that the data are not generated 
by a truly random process but by one that may follow a trend or a type of cyclical pattern. 
In this section, we will consider a test — called the runs test — of the hypothesis Hq that 
a given data set constitutes a random sample. 

To begin, let us suppose that each of the data values is either a or a 1. That is, we shall 
assume that each data value can be dichotomized as being either a success or a failure. Let 
X\, . . . ,X]\f denote the set of data. Any consecutive sequence of either l's or O's is called 
a run. For instance, the data set 

1001110010111 10100001 1 

contains 1 1 runs — 6 runs of 1 and 5 runs of 0. Suppose that the data set X\, . . . ,X^ 
contains n l's and m O's, where n + m = N, and let R denote the number of runs. 
Now, if Hq were true, thenXj, . . . , Xj^ would be equally likely to be any of the N\/{n\m\) 
permutations of n l's and m O's; and therefore, given a total of n l's and m O's, it follows 
that, under Hq, the probability mass function of R, the number of runs is given by 

number of permutations of n l's and m O's resulting in k runs 

P Ho {R = k}= V - ; 

n + m 
n 

This number of permutations can be explicitly determined and it can be shown that 

I 
P Ho {R = 2k} = 2- 



m — 1\ In — 1 

k- 1) U-i 



P Ho {R = 2k+l} = 



m + n 

11 



m — 1\ (n — 1\ (m — 1\ In — 1 
k-l ) { k ) + \ k ) U- 1 

n + m 
n 



(12.5.1) 



If the data contain n l's and m O's, then the runs test calls for rejection of the hypothesis 
that the data constitutes a random sample if the observed number of runs is either too 
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large or too small to be explained by chance. Specifically, if the observed number of runs 
is r, then the p-value of the runs test is 

p-value = 2m'm{P Ho {R > r},P Ho {R < r}) 

Program 12.5 uses Equation 12.5.1 to compute the/>-value. 

EXAMPLE 1 2.5a The following is the result of the last 30 games played by an athletic team, 
with W signifying a win and L a loss. 

WWWLWWLWWLWLWWLWWWWLWLWWWLWLWL 

Are these data consistent with pure randomness? 

SOLUTION To test the hypothesis of randomness, note that the data, which consists of 
20 W"s and 10 Us, contains 20 runs. To see whether this justifies rejection at, say, the 
5 percent level of significance, we run Program 12.5 and observe the results in Figure 12.6. 
Therefore, the hypothesis of randomness would be rejected at the 5 percent level of 
significance. (The striking thing about these data is that the team always came back to 
win after losing a game, which would be quite unlikely if all outcomes containing 20 wins 
and 10 losses were equally likely.) ■ 

The above can also be used to test for randomness when the data values are not just 
0's and Vs. To test whether the data X\, . . . ,Xjy constitute a random sample, let s-med 
denote the sample median. Also let n denote the number of data values that are less than 
or equal to s-med and m the number that are greater. (Thus, if n is even and all data values 



The p-value for the Runs Test for Randomness | 


This program computes the p-value for the runs test of the hypothesis 
that a data set of n ones and m zeroes is random. 






Start 




Enter the number of 1's: 20 








Enter the number of 0's: 10 






Quit 




Enter the number of runs: 20 








The p-value is 0.01845 



FIGURE I2.6 
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are distinct, then n = m = N/2.) Define I\, . . . , In by 

J 1 if Xj < s-mtd 

] 1 otherwise 

Now, if the original data constituted a random sample, then the number of runs in 
I\, . . . ,In would have a probability mass function given by Equation 12.5.1. Thus, it 
follows that we can use the preceding runs test on the data values I\ , . . . , In to test that 
the original data are random. 

EXAMPLE 12.5b The lifetime of 19 successively produced storage batteries is as follows: 

145 152 148 155 176 134 184 132 145 162 165 
185 174 198 179 194201 169 182 

The sample median is the 1 0th smallest value — namely, 169. The data indicating whether 
the successive values are less than or equal to or greater than 169 are as follows: 

1111010111 100000010 

Hence, the number of runs is 8. To determine if this value is statistically significant, we 
run Program 12.5 (with n = 10, m = 9) to obtain the result: 

/>-value = .357 

Thus the hypothesis of randomness is accepted. ■ 

It can be shown that, when n and m are both large and Hq is true, R will have 
approximately a normal distribution with mean and standard deviation given by 



2nm / 2nm(2nm — n — m) 

H= + 1 and o=A-. — T( (12.5.2) 

n + m y [n + m) A (n + m — \) 

Therefore, when n and m are both large 

{R — ix r — a 
^< ft 
a a 

<^\ 

r — fx 



= $ 



a 
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and, similarly 

r — \x 



P Ha {R > r) « 1 - d> 

Hence, for large n and w, the />-value of the runs test for randomness is approximately 
given by 

/(-value «a 2 min j <1> I J , 1 — <t> I J \ 

where fi and a are given by Equation 12.5.2 and r is the observed number of runs. 
EXAMPLE 12.5c Suppose that a sequence of sixty l's and sixty O's resulted in 75 runs. Since 



/3,540 

ii = 61 and a = , /— = 5.454 

* 119 



we see that the approximate />-value is 

/>-value % 2min{<!>(2.567), 1 - 0(2.567)} 
= 2 x (1 - .9949) 
= .0102 

On the other hand, by running Program 12.5 we obtain that the exact p-vaiue is 

Rvalue = .0130 

If the number of runs was equal to 70 rather than 75, then the approximate p- value would 
be 

Rvalue « 2[1 - 0(1.650)] = .0990 

as opposed to the exact value of 

Rvalue = .1189 ■ 
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Problems 

1. A new medicine against hypertension was tested on 18 patients. After 40 days of 
treatment, the following changes of the diastolic blood pressure were observed. 



5, 


-1, 


+2, 


+8, 


-25, 


+ 1, 


+5, 


-12, 


-16 


9, 


-8, 


-18, 


-5, 


-22, 


+4, 


-21, 


-15, 


-11 



Use the sign test to determine if the medicine has an effect on blood pressure. 
What is the />-value? 

An engineering firm is involved in selecting a computer system, and the choice 
has been narrowed to two manufacturers. The firm submits eight problems to the 
two computer manufacturers and has each manufacturer measure the number of 
seconds required to solve the design problem with the manufacturer's software. 
The times for the eight design problems are given below. 



Design problem 


1 


2 


3 


4 


5 


6 


7 


8 


Time with computer A 


15 


32 


17 


26 


42 


29 


12 


38 


Time with computer B 


22 


29 


1 


23 


46 


25 


19 


47 



Determine the p-value of the sign test when testing the hypothesis that there is no 
difference in the distribution of the time it takes the two types of software to solve 
problems. 

The published figure for the median systolic blood pressure of middle-aged men is 
128. To determine if there has been any change in this value, a random sample of 
100 men has been selected. Test the hypothesis that the median is equal to 128 if 

(a) 60 men have readings above 128; 

(b) 70 men have readings above 128; 

(c) 80 men have readings above 128. 

In each case, determine the />-value. 

To test the hypothesis that the median weight of 16-year-old females from 
Los Angeles is at least 110 pounds, a random sample of 200 such females was 
chosen. If 120 females weighed less than 110 pounds, does this discredit the 
hypothesis? Use the 5 percent level of significance. What is the/>-value? 

In 1987, the national median salary of all U.S. physicians was $124,400. A recent 
random sample of 14 physicians showed 1990 incomes of (in units of $1,000) 

125.5, 130.3, 133.0, 102.6, 198.0, 232.5, 106.8, 
114.5, 122.0, 100.0, 118.8, 108.6, 312.7, 125.5 
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Use these data to test the hypothesis that the median salary of physicians in 1990 
was not greater than in 1987. What is the/>-value? 

6. An experiment was initiated to study the effect of a newly developed gasoline 
detergent on automobile mileage. The following data, representing mileage per 
gallon before and after the detergent was added for each of eight cars, resulted. 



Car 


Mileage 
without Additive 


Mileage 
with Additive 


1 


24.2 


23.5 


2 


30.4 


29.6 


3 


32.7 


32.3 


4 


19.8 


17.6 


5 


25.0 


25.3 


6 


24.9 


25.4 


7 


22.2 


20.6 


8 


21.5 


20.7 



Find the />-value of the test of the hypothesis that mileage is not affected by the 
additive when 

(a) the sign test is used; 

(b) the signed rank test is used. 

7. Determine the/>-value when using the signed rank statistic in Problems 1 and 2. 

8. Twelve patients having high albumin content in their blood were treated with 
a medicine. Their blood content of albumin was measured before and after 
treatment. The measured values are shown in the table. 

Blood Content of Albumin" 



Patient N 


Before Treatment 


After Treatment 


1 


5.02 


4.66 


2 


5.08 


5.15 


3 


4.75 


4.30 


4 


5.25 


5.07 


5 


4.80 


5.38 


6 


5.77 


5.10 


7 


4.85 


4.80 


8 


5.09 


4.91 


9 


6.05 


5.22 



{continued) 
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10 


4.77 


11 


4.85 


12 


5.24 



Patient N Before Treatment After Treatment 

4.50 
4.85 
4.56 

a Values given in grams per 100 ml. 

Is the effect of the medicine significant at the 5 percent level? 

(a) Use the sign test. 

(b) Use the signed rank test. 

9. An engineer claims that painting the exterior of a particular aircraft affects its 
cruising speed. To check this, the next 10 aircraft off the assembly line were flown 
to determine cruising speed prior to painting, and were then painted and reflown. 
The following data resulted. 





Cruising Speed (knots) 


Aircraft 


Not Painted 


Painted 


1 


426.1 


416.7 


2 


418.4 


403.2 


3 


424.4 


420.1 


4 


438.5 


431.0 


5 


440.6 


432.6 


6 


421.8 


404.2 


7 


412.2 


398.3 


8 


409.8 


405.4 


9 


427.5 


422.8 


10 


441.2 


444.8 



Do the data uphold the engineer's claim? 

10. Ten pairs of duplicate spectrochemical determinations for nickel are presented 
below. The readings in column 2 were taken with one type of measuring instrument 
and those in column 3 were taken with another type. 



Sample 


D 


uplicates 


1 


1.94 




2.00 


2 


1.99 




2.09 


3 


1.98 




1.95 


4 


2.07 




2.03 


5 


2.03 




2.08 








{continued ) 
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Sample 


Duplicates 


6 


1.96 1.98 


7 


1.95 2.03 


8 


1.96 2.03 


9 


1.92 2.01 


10 


2.00 2.12 



Test the hypothesis, at the 5 percent level of significance, that the two measuring 
instruments give equivalent results. 

11. LetXi, . . . ,X n be a sample from the continuous distribution F having median m; 
and suppose we are interested in testing the hypothesis Hq : m = mo against the 
one-sided alternative H\ : m > mo- Present the one-sided analog of the signed 
rank test. Explain how the p-vahie would be computed. 

12. In a study of bilingual coding, 12 bilingual (French and English) college students 
are divided into two groups. Each group reads an article written in French, and 
each answers a series of 25 multiple-choice questions covering the content of the 
article. For one group the questions are written in French; the other takes the 
examination in English. The score (total correct) for the two groups is: 



Examination in French 


11 


12 


16 


22 


25 


25 


Examination in English 


10 


13 


17 


19 


21 


24 



Is this evidence at the 5 percent significance level that there is difficulty in 
transferring information from one language to another? 

13. Fifteen cities, of roughly equal size, are chosen for a traffic safety study. Eight of 
them are randomly chosen, and in these cities a series of newspaper articles dealing 
with traffic safety is run over a 1-month period. The number of traffic accidents 
reported in the month following this campaign is as follows: 



Treatment group 



Control group 



19 31 39 45 47 66 74 81 



28 36 44 49 52 52 60 



Determine the exact Rvalue when testing the hypothesis that the articles have 
not had any effect. 

14. Determine the p-value in Problem 13 by 

(a) using the normal approximation; 

(b) using a simulation study. 
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15. The following are the burning times in seconds of floating smoke pots of two 
different types: 



TypeX 


TypeY 


481 572 


526 537 


506 561 


511 582 


527 501 


556 601 


661 487 


542 558 


500 524 


491 578 



We are interested in testing the hypothesis that the burning time distributions are 
the same. 

(a) Determine the exact />-value. 

(b) Determine the />-value yielded by the normal approximation. 

(c) Run a simulation study to estimate the p-value. 

16. In a 1943 experiment (Whitlock and Bliss, "A bioassay technique for anti- 
helminthics," Journal of Parasitology, 29, pp. 48-58, 10), albino rats were used to 
study the effectiveness of carbon tetrachloride as a treatment for worms. Each rat 
received an injection of worm larvae. After 8 days, the rats were randomly divided 
into 2 groups of 5 each; each rat in the first group received a dose of .032 cc of 
carbon tetrachloride, whereas the dosage for each rat in the second group was .063 
cc. Two days later the rats were killed, and the number of adult worms in each 
rat was determined. The numbers detected in the group receiving the .032 dosage 
were 



whereas they were 



421, 462, 400, 378, 413 



207, 17, 412, 74, 116 



for those receiving the .063 dosage. Do the data prove that the larger dosage is 
more effective than the smaller? 

17. In a 10-year study of the dispersal patterns of beavers (Sun and Muller-Schwarze, 
"Statistical resampling methods in biology: A case study of beaver disper- 
sal patterns," American Journal of Mathematical and Management Sciences, 16, 
pp. 463-502, 1996) a total of 332 beavers were trapped in Allegheny State Park 
in southwestern New York. The beavers were tagged (so as to be identifiable when 
later caught) and released. Over time a total of 32 of them, 9 female and 23 male, 
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were discovered to have resettled in other sites. The following data give the disper- 
sal distances (in kilometers) between these beavers' original and resettled sites for 
the females and for the males. 

Females: .660, .984, .984, 1.992, 4.368, 6.960, 10.656, 21.600, 31.680 
Males: .288, .312, .456, .528, .576, .720, .792, .984, 1.224, 

1.584, 2.304, 2.328, 2.496, 2.688, 3.096, 3.408, 4.296, 4.884, 

5.928, 6.192, 6.384, 13.224, 27.600 

Do the data prove that the dispersal distances are gender related? 

18. The m sample problem: Consider m independent random samples of respective 
sizes n\, . . . , n m from the respective population distributions F\, . . . , F m ; and con- 
sider the problem of testing Hq : F\ = Fz = ■ ■ ■ = F m . To devise a test, let Ri 
denote the sum of the ranks of the « z - elements of sample i,i = l,...,m. Show 
that when Hq is true 

(a) F[R,] = — — — where N = J^ n i- 

(b) Using the foregoing, and drawing insight from the goodness of fit test 
statistic, determine an appropriate test statistic for Hq. 

(c) Explain how an algorithm that generates a random permutation of the 
integers 1,2, .. .,N can be employed in a simulation study to determine 
the p- value when using the statistic in part (b) to test Hq. 

19. A production run of 50 items resulted in 1 1 defectives, with the defectives occur- 
ring on the following items (where the items are numbered by their order of 
production): 8, 12, 13, 14, 31, 32, 37, 38, 40, 41, 42. Can we conclude that 
the successive items did not constitute a random sample? 

20. The following data represent the successive quality levels of 25 articles: 100, 1 10, 
122, 132, 99, 96, 88, 75, 45, 211, 154, 143, 161, 142, 99, 111, 105, 133, 
142, 150, 153, 121, 126, 117, 155. Does it appear that these data are a random 
sample from some population? 

21. Can we use the runs test if we consider whether each data value is less than or 
greater than some predetermined value rather than the value s-med? 

22. The following table (taken from Quinn, W. H., Neal, T. V., and Antunez de 
Mayolo, 1987, "El Nino occurrences over the past four- and- a-half centuries," 
Journal of Geophysical Research, 92 (C13), pp. 14,449-14,461) gives the years 
and magnitude (either moderate or strong) of major El Nino years between 1800 
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and 1987. Use it to test the hypothesis that the successive El Nino magnitudes 

constitute a random sample. 

Year and Magnitude (0 = moderate, 1 = strong) of Major El Nino Events, 1800—1987 

Year Magnitude Year Magnitude Year Magnitude 

1803 1 1866 1918 

1806 1867 1923 

1812 1871 1 1925 1 

1814 1 1874 1930 

1817 1877 1 1932 1 

1819 1880 1939 

1821 1884 1 1940 1 

1824 1887 1943 

1828 1 1891 1 1951 

1832 1896 1953 

1837 1899 1 1957 1 

1844 1 1902 1965 

1850 1905 1972 1 

1854 1907 1976 

1857 1911 1 1982 1 

1860 1914 1987 

1864 1 1917 1 
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QUALITY CONTROL 



13.1 INTRODUCTION 

Almost every manufacturing process results in some random variation in the items it 
produces. That is, no matter how stringently the process is being controlled, there is 
always going to be some variation between the items produced. This variation is called 
chance variation and is considered to be inherent to the process. However, there is another 
type of variation that sometimes appears. This variation, far from being inherent to the 
process, is due to some assignable cause and usually results in an adverse effect on the 
quality of the items produced. For instance, this latter variation may be caused by a faulty 
machine setting, or by poor quality of the raw materials presently being used, or by incorrect 
software, or human error, or any other of a large number of possibilities. When the only 
variation present is due to chance, and not to assignable cause, we say that the process is 
in control, and a key problem is to determine whether a process is in or is out of control. 

The determination of whether a process is in or out of control is greatly facilitated by 
the use of control charts, which are determined by two numbers — the upper and lower 
control limits. To employ such a chart, the data generated by the manufacturing process 
are divided into subgroups and subgroup statistics — such as the subgroup average and 
subgroup standard deviation — are computed. When the subgroup statistic does not fall 
within the upper and lower control limit, we conclude that the process is out of control. 

In Sections 13.2 and 13.3, we suppose that the successive items produced have 
measurable characteristics, whose mean and variance are fixed when the process is in 
control. We show how to construct control charts based on subgroup averages (in 
Section 13.2) and on subgroup standard deviations (in Section 13.3). In Section 13.4, 
we suppose that rather than having a measurable characteristic, each item is judged by 
an attribute — that is, it is classified as either acceptable or unacceptable. Then we show 
how to construct control charts that can be used to indicate a change in the quality of 
the items produced. In Section 13.5, we consider control charts in situations where each 
item produced has a random number of defects. Finally, in Section 13.6 we consider more 
sophisticated types of control charts — ones that don't consider each subgroup value in 
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isolation but rather take into account the values of other subgroups. Three different control 
charts of this type — known as moving- average, exponential weighted moving- average, 
and cumulative sum control charts — are presented in Section 13.6. 

13.2 CONTROL CHARTS FOR AVERAGE VALUES: 
THE X-CONTROL CHART 

Suppose that when the process is in control the successive items produced have measurable 
characteristics that are independent, normal random variables with mean [i and variance 
a . However, due to special circumstances, suppose that the process may go out of control 
and start producing items having a different distribution. We would like to be able to 
recognize when this occurs so as to stop the process, find out what is wrong, and fix it. 

LetXi, X2, . . . denote the measurable characteristics of the successive items produced. 
To determine when the process goes out of control, we start by breaking the data up into 
subgroups of some fixed size — call it n. The value of n is chosen so as to yield uniformity 
within subgroups. That is, n may be chosen so that all data items within a subgroup were 
produced on the same day, or on the same shift, or using the same settings, and so on. 
In other words, the value of n is chosen so that it is reasonable that a shift in distribution 
would occur between and not within subgroups. Typical values of n are 4, 5, or 6. 

LetX/, i — 1,2,... denote the average of the z'th subgroup. That is, 



x x = 


Xi+---+X„ 


n 


X2-- 


X n +i + • • ■ -¥Xin 


n 


X,-- 


Xln+\ + ■ ' ■ +^3» 



and so on. Since, when in control, each of the Xi have mean fi and variance a , it follows 
that 

E{Xi) = fi 

a 2 
Var(X 2 ) = — 

n 



and so 

Xi - jx 



M(o, 1) 
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That is, if the process is in control throughout the production of subgroup i, then 
yfn(Xi — jj)la has a standard normal distribution. Now it follows that a standard nor- 
mal random variable Z will almost always be between —3 and +3. (Indeed, P{— 3 < 
Z < 3} = .9973.) Hence, if the process is in control throughout the production of the 
items in subgroup i, then we would certainly expect that 

r Xi - [J. 



or, equivalently, that 



3ct — 3ct 

H -<I, </x+— : 

Jn Jn 



The values 

3a 



UCL = /x + 



3ct 
LCL = fi — 

Jn 



are called, respectively, the upper and lower control limits. 

The X-control chart, which is designed to detect a change in the average value of an 
item produced, is obtained by plotting the successive subgroup averages Xi and declaring 
that the process is out of control the first time Xi does not fall between LCL and UCL 
(see Figure 13.1). 

EXAMPLE 13.2a A manufacturer produces steel shafts having diameters that should be 
normally distributed with mean 3 mm and standard deviation .1 mm. Successive samples 
of four shafts have yielded the following sample averages in millimeters. 



Sample 


X 


Sample 


X 


1 


3.01 


6 


3.02 


2 


2.97 


7 


3.10 


3 


3.12 


8 


3.14 


4 


2.99 


9 


3.09 


5 


3.03 


10 


3.20 



What conclusion should be drawn? 
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X 



vn 



Out of control 



~\ 



vn 



6 8 

Subgroup 



10 12 14 



FIGURE 1 3. 1 Control chart for X, n = size of subgroup. 



SOLUTION When in control the successive diameters have mean /x = 3 and standard 
deviation a = .1, and so with n = 4 the control limits are 



LCL 



3U) 

V4 



2.85, 



UCL = 3 + 



3d) 

V4 



3.15 



Because sample number 10 falls above the upper control limit, it appears that there is 
reason to suspect that the mean diameter of shafts now differs from 3. (Clearly, judging 
from the results of Samples 6 through 10 it appears to have increased beyond 3.) ■ 



REMARKS 

(a) The foregoing supposes that when the process is in control the underlying distribution 
is normal. However, even if this is not the case, by the central limit theorem it follows that 
the subgroup averages should have a distribution that is roughly normal and so would be 
unlikely to differ from its mean by more than 3 standard deviations. 

(b) It is frequently the case that we do not determine the measurable qualities of all the 
items produced but only those of a randomly chosen subset of items. If this is so then it is 
natural to select, as a subgroup, items that are produced at roughly the same time. 

It is important to note that even when the process is in control there is a chance — 
namely, .0027 — that a subgroup average will fall outside the control limit and so one 
would incorrectly stop the process and hunt for the nonexistent source of trouble. 

Let us now suppose that the process has just gone out of control by a change in the 
mean value of an item from jjl to jjl + a where a > 0. How long will it take (assuming 
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things do not change again) until the chart will indicate that the process is now out of 
control? To answer this, note that a subgroup average will be within the control limits if 

r X - ix 

—3 < sin < 3 

a 



or, equivalently, if 



a^/n j—X — fi a^/n a^/n 

-3 < s/n < 3 

o a a a 



or 



a-y/n !—X — [i — a a^/n 

—3 < sjn < 3 

a a o 

Hence, since X is normal with mean /x + a and variance a 2 ln — and so s/n(X — 
[J. — a) I a has a standard normal distribution — the probability that it will fall within 
the control limits is 

, aJn „ aJn I / aJn \ I ajn 

P \ -3 - ^— < Z < 3 - ^— [ = <D I 3 - ^— I - O I -3 

, aJli 



and so the probability that it falls outside is approximately 1 — <1>(3 — dspnlo). For 
instance, if the subgroup size is n = 4, then an increase in the mean value of 1 standard 
deviation — that is, a = o — will result in the subgroup average falling outside of the 
control limits with probability 1 — 0(1) = .1587. Because each subgroup average will 
independently fall outside the control limits with probability 1 — 0(3 — a-Jnla), it follows 
that the number of subgroups that will be needed to detect this shift has a geometric 
distribution with mean {1 — <I>(3 — a*Jnlo)}~ . (In the case mentioned before with 
n = 4, the number of subgroups one would have to chart to detect a change in the mean 
of 1 standard deviation has a geometric distribution with mean 6.3.) 

13.2.1 Case of Unknown \i and a 

If one is just starting up a control chart and does not have reliable historical data, then fi 
and a would not be known and would have to be estimated. To do so, we employ k of the 
subgroups where k should be chosen so that k > 20 and nk > 100. IfXi, i = 1, . . . , k is 

the average of the z'th subgroup, then it is natural to estimate fi by X the average of these 
subgroup averages. That is, 

= = Xi + ---+X k 
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To estimate a, let Si denote the sample standard deviation of the z'th subgroup, 
i = 1, . . . , k. That is, 



Si = 



E 



C%-*i) 2 
» — 1 



5 2 = 



E 



» — 1 



E 



C%-i)»+» —XkY 

n — 1 



Let 



5 = (Si + - - • + 5*)/* 



The statistic 5" will not be an unbiased estimator of a — that is, E[S] 7^ o . To transform 
it into an unbiased estimator, we must first compute E[S], which is accomplished as 
follows: 



E[S] 



E[S{] 



+ E[S k ] 



(13.2.1) 



= E[S,] 

where the last equality follows since Si, . . . , S^ are independent and identically distributed 
(and thus have the same mean). To compute E[S{\, we make use of the following 
fundamental result about normal samples — namely, that 



(n-l)Sf _" (X i -X 1 ) 2 _ 2 
„2 2^ „2 ~ X «"l 



(13.2.2) 



Now it is not difficult to show (see Problem 3) that 

E[Vf]= I ' whenF~x H 2 _ 



(13.2.3) 



Six 



E[y/{n - \)S 2 la 2 ] = y/n - lE[S{\la 
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we see from Equations 13.2.2 and 13.2.3 that 

-j2V{nl2)a 



E[Si\ 



V^TlW) 



Hence, if we set 



y/2T(nll) 

c[n) = 



y^rn 



«-l\ 



then it follows from Equation 13.2.1 that Slc(n) is an unbiased estimator of a . 
Table 13.1 presents the values of c{n) for n = 2 through n = 10. 

TECHNICAL REMARK 

In determining the values in Table 13.1, the computation of T{nl2) and T{n — 5) was 
based on the recursive formula 

T{a) = (a- l)T(a - 1) 

TABLE 13.1 Values of c(n) 



c{2) 


= 


.7978849 


f(3) 


= 


.8862266 


c(4) 


= 


.9213181 


f(5) 


= 


.9399851 


c{6) 


= 


.9515332 


c{7) 


= 


.9593684 


c(8) 


= 


.9650309 


c(9) 


= 


.9693103 


c(10) 


= 


.9726596 



which was established in Section 5.7. This recursion yields that, for integer n, 

T{n) = (»- l)(»-2)---3-2- 1 • T(l) 

^■00 
= (»-l)! since r(l)=/ e- x dx=\ 

Jo 

The recursion also yields that 

r,'=±l)-(4i)(.-|)...|.I.r(I 
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with 



r ( - ) - / e- x x~ m dx 



-L 



00 - V2 - 2 



e y y dy by x = — dx = y dy 



y 

= V2 e-y 2,2 dy 

Jo 

= 2^-L= / e~y l2 dy 
V2tt Jo 

= 2jnP[N(0, 1) > 0] 



The preceding estimates for fi and o make use of all k subgroups and thus are reasonable 
only if the process has remained in control throughout. To check this, we compute the 
control limits based on these estimates of /x and a , namely, 

= °>S 

LCL = X-— — - (13.2.4) 

sjncyn) 

= 3S 
UCL = X + 



spnc{ri) 



We now check that each of the subgroup averages X t falls within these lower and upper 
limits. Any subgroup whose average value does not fall within the limits is removed (we 
suppose that the process was temporarily out of control) and the estimates are recomputed. 
We then again check that all the remaining subgroup averages fall within the control limits. 
If not, then they are removed, and so on. Of course, if too many of the subgroup averages 
fall outside the control limits, then it is clear that no control has yet been established. 

EXAMPLE 13.2b Let us reconsider Example 13.2a under the new supposition that the 
process is just beginning and so /x and a are unknown. Also suppose that the sample 
standard deviations were as follows: 





X 


S 




X 


S 


1 


3.01 


.12 


6 


3.02 


.08 


2 


2.97 


.14 


7 


3.10 


.15 


3 


3.12 


.08 


8 


3.14 


.16 


4 


2.99 


.11 


9 


3.09 


.13 


5 


3.03 


.09 


10 


3.20 


.16 
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SinceX = 3. 067, S = .122, c(4) = .9213, the control limits are 

3 (.122) 

LCL = 3.067 — = 2.868 

2 x .9213 

3 (.122) 

UCL = 3.067 + — — = 3.266 

2 x .9213 

Since all the Xj fall within these limits, we suppose that the process is in control with 
fi = 3.067 and a = ~Slc(A) = .1314. 

Suppose now that the values of the items produced are supposed to fall within the 
specifications 3 ± . 1 . Assuming that the process remains in control and that the foregoing 
are accurate estimates of the true mean and standard deviation, what proportion of the 
items will meet the desired specifications? 

SOLUTION To answer the foregoing, we note that when fi = 3.067 and a = .1324, 

„,„„ v „ „ f 2.9 -3.067 X- 3.067 3.1-3.067 
P{1.9 < X < 3.1} = P \ — < — < 



.1324 ~ .1324 ~ .1324 
= 0(.2492) - 4>(- 1.2613) 
= .5984 - (1 - .8964) 
= .4948 

Hence, 49 percent of the items produced will meet the specifications. ■ 

REMARKS 

(a) The estimator X is equal to the average of all nk measurements and is thus the obvious 
estimator of /x. However, it may not immediately be clear why the sample standard 
deviation of all the nk measurements, namely, 



1 {Xi-XY' 



£ 



nk — 1 

M i=i 

is not used as the initial estimator of o . The reason it is not is that the process may not have 
been in control throughout the first k subgroups, and thus this latter estimator could be 
far away from the true value. Also, it often happens that a process goes out of control by an 
occurrence that results in a change of its mean value fi while leaving its standard deviation 
unchanged. In such a case, the subgroup sample deviations would still be estimators of a, 
whereas the entire sample standard deviation would not. Indeed, even in the case where the 
process appears to be in control throughout, the estimator of a presented is preferred over 
the sample standard deviation S. The reason for this is that we cannot be certain that the 
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mean has not changed throughout this time. That is, even though all the subgroup averages 
fall within the control limits, and so we have concluded that the process is in control, there 
is no assurance that there are no assignable causes of variation present (which might have 
resulted in a change in the mean that has not yet been picked up by the chart). It merely 
means that for practical purposes it pays to act as if the process was in control and let 
it continue to produce items. However, since we realize that some assignable cause of 
variation might be present, it has been argued that Slc{n) is a "safer" estimator than the 
sample standard deviation. That is, although it is not quite as good when the process has 
really been in control throughout, it could be a lot better if there had been some small 
shifts in the mean. 

(b) In the past, an estimator of a based on subgroup ranges — defined as the difference 
between the largest and smallest value in the subgroup — has been employed. This was 
done to keep the necessary computations simple (it is clearly much easier to compute the 
range than it is to compute the subgroup's sample standard deviation). However, with 
modern-day computational power this should no longer be a consideration, and since the 
standard deviation estimator both has smaller variance than the range estimator and is more 
robust (in the sense that it would still yield a reasonable estimate of the population standard 
deviation even when the underlying distribution is not normal), we will not consider the 
latter estimator in this text. 

13.3 S-CONTROL CHARTS 

The X-control charts presented in the previous section are designed to pick up changes in 
the population mean. In cases where one is also concerned about possible changes in the 
population variance, we can utilize an S-control chart. 

As before, suppose that, when in control, the items produced have a measurable 
characteristic that is normally distributed with mean /x and variance a . If S{ is the 
sample standard deviation for the ith subgroup, that is, 






Y~» C^(>'-i)«+/ ~~ Xi) 2 



(»-l) 



then, as was shown in Section 13.2.1, 

E[Si\ = c{n)(j (13.3.1) 

In addition, 

Var(5,-) = E[Sf] - (E[S t ]) 2 (13.3.2) 

= a 2 - c 2 {n)o 2 
= ct 2 [1 - c 2 {n)] 
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where the next to last equality follows from Equation 13.2.2 and the fact that the expected 
value of a chi-square random variable is equal to its degrees of freedom parameter. 

On using the fact that, when in control, Si has the distribution of a constant (equal 
to al\Jn — 1) times the square root of a chi-square random variable with n — \ degrees 
of freedom, it can be shown that Si will, with probability near to 1, be within 3 standard 
deviations of its mean. That is, 



P{E[Si] - 3^Var(Si) < Si < E[S t ] + 3y/Vai{Si)} « .99 

Thus, using the formulas 13.3.1 and 13.3.2 for E[Si\ and Var(5 ? ), it is natural to set the 
upper and lower control limits for the S chart by 



UCL = a[c{n) + 3^1 - c 2 {n)] (13.3.3) 

LCL = a[c{n) - 3-y/l - c 2 (n)] 

The successive values of Si should be plotted to make certain they fall within the upper 
and lower control limits. When a value falls outside, the process should be stopped and 
declared to be out of control. 

When one is just starting up a control chart and a is unknown, it can be estimated 
from S/c(n). Using the foregoing, the estimated control limits would then be 

UCL = 5[l + 3 v / l/c 2 («)- 1] (13.3.4) 



LCL = S[l - 3^\lc 2 {n) - 1] 

As in the case of starting up an ^-control chart, it should then be checked that the k 
subgroup standard deviations S\,S%, . . . ,S^ all fall within these control limits. If any of 
them falls outside, then those subgroups should be discarded and S recomputed. 

EXAMPLE 13.3a The following are the X and S values for 20 subgroups of size 5 for 
a recently started process. 



Subgroup 


X 


S 


Subgroup 


X 


S 


Subgroup 


X 


S 


Subgroup 


X 


S 


1 


35.1 


4.2 


6 


36.4 


4.5 


11 


38.1 


4.2 


16 


41.3 


8.2 


2 


33.2 


4.4 


7 


35.9 


3.4 


12 


37.6 


3.9 


17 


35.7 


8.1 


3 


31.7 


2.5 


8 


38.4 


5.1 


13 


38.8 


3.2 


18 


36.3 


4.2 


4 


35.4 


3.2 


9 


35.7 


3.8 


14 


34.3 


4.0 


19 


35.4 


4.1 


5 


34.5 


2.6 


10 


27.2 


6.2 


15 


43.2 


3.5 


20 


34.6 


3.7 
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Since X = 35.94, S = 4.35, c(5) = .9400, we see from Equations 13.2.4 and 13.3.4 that 
the preliminary upper and lower control limits for X and S are 

UCL(X) =42.149 
LCL(J)= 29.731 
UCL(S) = 9.087 
LCL(S) = -.386 



(a) 
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(b) 
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FIGURE 1 3.2 
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The control charts forX and S with the preceding control limits are shown in Figures 13.2a 
and 13.2b. Since X\q andXi5 fall outside the X control limits, these subgroups must be 
eliminated and the control limits recomputed. We leave the necessary computations as an 



13.4 CONTROL CHARTS FOR THE FRACTION DEFECTIVE 

The X- and 5-control charts can be used when the data are measurements whose values can 
vary continuously over a region. There are also situations in which the items produced have 
quality characteristics that are classified as either being defective or nondefective. Control 
charts can also be constructed in this latter situation. 

Let us suppose that when the process is in control each item produced will independently 
be defective with probability p. If we let X denote the number of defective items in a 
subgroup of n items, then assuming control, Xwill be a binomial random variable with 
parameters (n, p). If F = XI n is the fraction of the subgroup that is defective, then 
assuming the process is in control, its mean and standard deviation are given by 



E[X] np 

E[F] = = -±- -p 

n n 




/Vm-CYl lnti(\ t,\ 


lp(l -p) 

"V n 


\r iv\ / var v^J np(i p) 



Hence, when the process is in control the fraction defective in a subgroup of size n should, 
with high probability, be between the limits 



LCL=f-J^-^, VCL=p + J p{l 



The subgroup size n is usually much larger than the typical values of between 4 and 10 used 
in X and S charts. The main reason for this is that if p is small and n is not of reasonable 
size, then most of the subgroups will have zero defects even when the process goes out of 
control. Thus, it would take longer than it would if n were chosen so that np were not 
close to zero to detect a shift in quality. 

To start such a control chart it is, of course, necessary first to estimate p. To do so, 
choose k of the subgroups, where again one should try to take k > 20, and let i 7 , denote the 
fraction of the /'th subgroup that are defective. The estimate of p is given by F defined by 

- F x +---+F k 
F = 
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Since nFi is equal to the number of defectives in subgroup i, we see that F^ can also be 
expressed as 



F = 



nF\ + • • • + nFfr 



total number of defectives in all the subgroups 
number of items in the subgroups 

In other words, the estimate of/> is just the proportion of items inspected that are defective. 
The upper and lower control limits are now given by 



LCL = F 



IF(l-F) 



UCL = F + 3, 



IF(I-F) 



We should now check whether the subgroup fractions F\,F2,. . . ,Fk fall within these 
control limits. If some of them fall outside, then the corresponding subgroups should be 
eliminated and F recomputed. 

EXAMPLE 1 3.4a Successive samples of 50 screws are drawn from the hourly production of 
an automatic screw machine, with each screw being rated as either acceptable or defective. 
This is done for 20 such samples with the following data resulting. 



Subgroup 


Defectives 


F 


Subgroup 


Defectives 


F 


1 


6 


.12 


11 


1 


.02 


2 


5 


.10 


12 


3 


.06 


3 


3 


.06 


13 


2 


.04 


4 





.00 


14 





.00 


5 


1 


.02 


15 


1 


.02 


6 


2 


.04 


16 


1 


.02 


7 


1 


.02 


17 





.00 


8 





.00 


18 


2 


.04 


9 


2 


.04 


19 


1 


.02 


10 


1 


.02 


20 


2 


.04 



We can compute the trial control limits as follows: 



— total number defectives 34 

F = : = = .034 

total number items 1,000 
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UCL = .034 + 3 / ( -° 34)( - 968) = .1109 
V 50 



.034 .966 

LCL = .034 - 3,/ = -.0429 

V 50 

Since the proportion of defectives in the first subgroup falls outside the upper control limit, 
we eliminate that subgroup and recompute F as 

- 34-6 

F = = .0295 

950 

The new upper and lower control limits are .0295 ± V(-0295)(l — .0295)/50, or 

LCL =-.0423, UCL = .1013 

Since the remaining subgroups all have fraction defectives that fall within the control limits, 
we can accept that, when in control, the fraction of defective items in a subgroup should 
be below. 1013. ■ 

REMARK 

Note that we are attempting to detect any change in quality even when this change results 
in improved quality. That is, we regard the process as being "out of control" even when 
the probability of a defective item decreases. The reason for this is that it is important to 
notice any change in quality, for either better or worse, to be able to evaluate the reason 
for the change. In other words, if an improvement in product quality occurs, then it is 
important to analyze the production process to determine the reason for the improvement. 
(That is, what are we doing right?) 

13.5 CONTROL CHARTS FOR NUMBER OF DEFECTS 

In this section, we consider situations in which the data are the numbers of defects in units 
that consist of an item or group of items. For instance, it could be the number of defective 
rivets in an airplane wing, or the number of defective computer chips that are produced 
daily by a given company. Because it is often the case that there are a large number of 
possible things that can be defective, with each of these having a small probability of actually 
being defective, it is probably reasonable to assume that the resulting number of defects 
has a Poisson distribution.* So let us suppose that, when the process is in control, the 
number of defects per unit has a Poisson distribution with mean A.. 

* See Section 5.2 for a theoretical explanation. 
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If we let Xi denote the number of defects in the ith unit, then, since the variance of 
a Poisson random variable is equal to its mean, when the process is in control 

E[Xi\ = X, VaipQ) = X 

Hence, when in control each JQ should with high probability be within X ± 3v X, and so 
the upper and lower control limits are given by 

UCL = X + 3Vx, LCL = X - 3Vx 

As before, when the control chart is started and k is unknown, a sample of k units should 
be used to estimate X by 

X = (X x +---+X k )lk 

This results in trial control limits 

J+3\/f and J-3VX 

If all the X{, i = 1, . . . , k fall within these limits, then we suppose that the process is in 
control with X = X. If some fall outside, then these points are eliminated and we recompute 
X, and so on. 

In situations where the mean number of defects per item (or per day) is small, one 
should combine items (days) and use as data the number of defects in a given number — 
say, n — of items (or days). Since the sum of independent Poisson random variables 
remains a Poisson random variable, the data values will be Poisson distributed with a 
larger mean value X. Such combining of items is useful when the mean number of defects 
per item is less than 25. 

To obtain a feel for the advantage in combining items, suppose that the mean number 
of defects per item is 4 when the process is under control; and suppose that something 
occurs that results in this value changing from 4 to 6, that is, an increase of 1 standard 
deviation occurs. Let us see how many items will be produced, on average, until the process 
is declared out of control when the successive data consist of the number of defects in n 
items. 

Since the number of defects in a sample of n items is, when under control, Poisson 
distributed with mean and variance equal to An, the control limits are An ± 3v4w or 
An ± 6^/n. Now if the mean number of defects per item changes to 6, then a data value 
will be Poisson with mean Gn and so the probability that it will fall outside the control 
limits — call it p(n) — is given by 

p{n) = P[Y >An + 6^/n} + P{Y < An - 6^/n} 
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when Fis Poisson with mean Gn. Now 



pin) % P{Y > An + Gjn} 

Y — Gn G^/n — In 

= P\ — ==- > 1= — 

\/Gn V6n 

P\Z > V — \ where Z ~ N(0, 1) 



= 1-<D| </6-2. 

Because each data value will be outside the control limits with probability p(n), it follows 
that the number of data values needed to obtain one outside the limits is a geometric 
random variable with parameter />(«), and thus has mean \lp(n). Finally, since there are n 
items for each data value, it follows that the number of items produced before the process 
is seen to be out of control has mean value nlp(n): 

Average number of items produced while out of control = «/(l — <1>(V6 — J -?)) 

We plot this for various n in Table 13.2. Since larger values of n are better when the 
process is in control (because the average number of items produced before the process is 
incorrectly said to be out of control is approximately «/.0027), it is clear from Table 13.2 
that one should combine at least 9 of the items. This would mean that each data value 
(equal to the number of defects in the combined set) would have mean at least 9x4 = 36. 



TABLE 13.2 






n 


Average 


Number of Items 


1 




19.6 


2 




20.66 


3 




19.80 


4 




19.32 


5 




18.80 


6 




18.18 


7 




18.13 


8 




18.02 


9 




18 


10 




18.18 


11 




18.33 


12 




18.51 



EXAMPLE 1 3.5a The following data represent the number of defects discovered at a factory 
on successive units of 10 cars each. 
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Cars 


Defects 


Cars 


Defects 


Cars 


Defects 


Cars 


Defects 


1 


141 


6 


74 


11 


63 


16 


68 


2 


162 


7 


85 


12 


74 


17 


95 


3 


150 


8 


95 


13 


103 


18 


81 


4 


111 


9 


76 


14 


81 


19 


102 


5 


92 


10 


68 


15 


94 


20 


73 



Does it appear that the production process was in control throughout? 
SOLUTION Since X = 94.4, it follows that the trial control limits are 



LCL = 94.4 - 3^944 = 65.25 
UCL = 94.4 + 3^944 = 123.55 



Since the first three data values are larger than UCL, they are removed and the sample 
mean recomputed. This yields 



X = 



(94.4)20 - (141 + 162 + 150) 



17 



= 84.41 



and so the new trial control limits are 



LCL = 84.41 - 3V84.41 = 56.85 
UCL = 84.41 + 3V84.41 = 111.97 



At this point since all remaining 17 data values fall within the limits, we could declare that 
the process is now in control with a mean value of 84.41. However, because it seems that 
the mean number of defects was initially high before settling into control, it seems quite 
plausible that the data value X4 also originated before the process was in control. Thus, it 
would seem prudent in this situation to also eliminate X4 and recompute. Based on the 
remaining 16 data values, we obtain that 

X = 82.56 
LCL = 82.56 - 3V82.56 = 55.30 
UCL = 82.56 + 3V82.56 = 109.82 



and so it appears that the process is now in control with a mean value of 82.56. 
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13.6 OTHER CONTROL CHARTS FOR DETECTING 
CHANGES IN THE POPULATION MEAN 

The major weakness of the X-control chart presented in Section 13.2 is that it is relatively 
insensitive to small changes in the population mean. That is, when such a change occurs, 
since each plotted value is based on only a single subgroup and so tends to have a relatively 
large variance, it takes, on average, a large number of plotted values to detect the change. 
One way to remedy this weakness is to allow each plotted value to depend not only on 
the most recent subgroup average but on some of the other subgroup averages as well. 
Three approaches for doing this that have been found to be quite effective are based on 
(1) moving averages, (2) exponentially weighted moving averages, and (3) cumulative sum 
control charts. 

13.6.1 Moving-Average Control Charts 

The moving- average control chart of span size k is obtained by continually plotting the 
average of the k most recent subgroups. That is, the moving average at time t, call it M t , 
is defined by 

M _ X t + X t -\ + ■ ■ ■ +X t _k+\ 

l ~ k 

where X; is the average of the values of subgroup i. The successive computations can be 
easily performed by noting that 

kM t =X t +Xt-i + ---+ X t _ k+l 

and, substituting t + 1 for t, 

kM t+l = X t+l +X t + --- +X t _ k+2 
Subtraction now yields that 

kM t+ i - kM, = X t+1 - X t -k+i 



In words, the moving average at time t + 1 is equal to the moving average at time t 
plus Ilk times the difference between the newly added and the deleted value in the 
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moving average. For values of t less than k, M t is defined as the average of the first t 
subgroups. That is, 

M t = ift < k 



Suppose now that when the process is in control the successive values come from a 
normal population with mean fi and variance a . Therefore, if n is the subgroup size, it 
follows that Xi is normal with mean fx and variance a In. From this we see that the average 



of m of the Xi will be normal with mean fi and variance given by Va.i(X i)/m = o I 
and, therefore, when the process is in control 

E[M t ] = ix 



nm 



Var(M) = 1 2 



o 2 lnt if t < k 
o Ink otherwise 



[i + 3ol*Jnt 


ift <k 


[i + 3ol\fnk 


otherwise 


[i — 3al*Jnt 


if? < k 


[i — 3ol\fnk 


otherwise 



Because a normal random variable is almost always within 3 standard deviations of its 
mean, we have the following upper and lower control limits for M t : 



UCL 
LCL 



In other words, aside from the first k — 1 moving averages, the process will be declared 
out of control whenever a moving average differs from \i by more than 3<J Ivnk. 

EXAMPLE 13.6a When a certain manufacturing process is in control, it produces items 
whose values are normally distributed with mean 10 and standard deviation 2. The fol- 
lowing simulated data represent the values of 25 subgroup averages of size 5 from a normal 
population with mean 1 1 and standard deviation 2. That is, these data represent the sub- 
group averages after the process has gone out of control with its mean value increasing 
from 10 to 11. Table 13.3 presents these 25 values along with the moving averages based 
on span size k = 8 as well as the upper and lower control limits. The lower and upper 
control limits for t > 8 are 9.051318 and 10.94868. 

As the reader can see, the first moving average to fall outside its control limits occurred 
at time 1 1, with other such occurrences at times 12, 13, 14, 16, and 25. (It is interesting 
to note that the usual control chart — that is, the moving average with k = 1 — would 
have declared the process out of control at time 7 since X-j was so large. However, this is 
the only point where this chart would have indicated a lack of control (see Figure 13.3). 



13.6 Other Control Charts for Detecting Changes in the Population Mean 



565 



TABLE 13.3 



t 


x t 


M t 


LCL 


UCL 


1 


9.617728 


9.617728 


7.316719 


12.68328 


2 


10.25437 


9.936049 


8.102634 


11.89737 


3 


9.876195 


9.913098 


8.450807 


11.54919 


4 


10.79338 


10.13317 


8.658359 


11.34164 


5 


10.60699 


10.22793 


8.8 


11.2 


6 


10.48396 


10.2706 


8.904554 


11.09545 


7 


13.33961 


10.70903 


8.95815 


11.01419 


8 


9.462969 


10.55328 


9.051318 


10.94868 


9 


10.14556 


10.61926 






10 


11.66342 


10.79539 






*11 


11.55484 


11.00634 






*12 


11.26203 


11.06492 






*13 


12.31473 


11.27839 






*14 


9.220009 


11.1204 






15 


11.25206 


10.85945 






16 


10.48662 


10.98741 






17 


9.025091 


10.84735 






18 


9.693386 


10.6011 






19 


11.45989 


10.58923 






20 


12.44213 


10.73674 






21 


11.18981 


10.59613 






22 


11.56674 


10.88947 






23 


9.869849 


10.71669 






24 


12.11311 


10.92 






*25 


11.48656 


11.22768 







* = Out of control. 

There is an inverse relationship between the size of the change in the mean value that 
one wants to guard against and the appropriate moving-average span size k. That is, the 
smaller this change is, the larger k ought to be. ■ 



13.6.2 Exponentially Weighted Moving-Average Control Charts 

The moving- average control chart of Section 13.6.1 considered at each time t a weighted 
average of all subgroup averages up to that time, with the k most recent values being 
given weight l/k and the others given weight 0. Since this appears to be a most effective 
procedure for detecting small changes in the population mean, it raises the possibility that 
other sets of weights might also be successfully employed. One set of weights that is often 
utilized is obtained by decreasing the weight of each earlier subgroup average by a constant 
factor. 
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FIGURE 13.3 



14 



12 



10 



Control chart for X 



UCL 



10 



15 
t 



20 



-nLCL 



25 



30 



Let 

W t = dX t + (1 - a)W t -i 

where a is a constant between and 1, and where 



(13.6.1) 



W =fM 

The sequence of values W t , t = 0, 1,2, ... is called an exponentially weighted moving 
average. To understand why it has been given that name, note that if we continually 
substitute for the W term on the right side of Equation 13.6.1, we obtain that 



W, = aX t + (1 - a)[aX t -x + (1 - a)W t - 2 ] 
= aX t + o(l - aWt-l + (1 " «) 2 W t -2 
= aX t + a(l - afft-i + (1 " a) 2 [aX t - 2 + (1 - a) W t - 3 ] 
= aX t + a(l - afft-i + a(l " <x) 2 X t _ 2 + (1 - a) 3 W t - 3 



(13.6.2) 



aX t + a{\ - a)X t -x + a(l - a) z X t - 2 + 
+ a(l-a) t - l X l + (l-aYn 
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where the foregoing used the fact that Wo = [Jl. Thus we see from Equation 13.6.2 that 
W t is a weighted average of all the subgroup averages up to time t, giving weight a to 
the most recent subgroup and then successively decreasing the weight of earlier subgroup 
averages by the constant factor I — a, and then giving weight (1 — a)' to the in-control 
population mean. 

The smaller the value of a, the more even the successive weights. For instance, if 
a = .1 then the initial weight is .1 and the successive weights decrease by the factor .9; 
that is, the weights are .1, .09, .081, .073, .066, .059, and so on. On the other hand, 
if one chooses, say, a = .4, then the successive weights are .4, .24, .144, .087, .052, . . . 
Since the successive weights a {I — a)' 



v , i = 1,2, . . . , can be written as 



a(l — a)' = ae 



vher 



1-a 



= -log(l-a) 



we say that the successively older data values are "exponentially weighted" (see 
Figure 13.4). 

To compute the mean and variance of the W t , recall that, when in control, the subgroup 
averages X{ are independent normal random variables each having mean fi and variance 
a In. Therefore, using Equation 13.6.2, we see that 

E[W t ] = ii[a + «(1 - a) + a(l - a) 2 + ■ • • + a(l - a)' -1 + (1 - a)'] 
/Lta[l - (1 -a) 1 ] 



l-(l-a) 



+ m(i -ay 



li 




FIGURE 13.4 Plot of a(l - a)'~ l when a = .4 
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To determine the variance, we again use Equation 13.6.2: 

a 2 
Var(Wi) = —{a 2 + [o(l - a)] 2 + [a(l - a) 2 ] 2 + ■■■ + [a(l - a)'" 1 ] 2 } 



a 2 



= — a z [l + P + p z -\ h /6'" 1 ] where p = (1 - a) 7 

n 

_ a 2 a 2 [l - (1 -a) 2 '] 

~ «[1 - (1 -a) 2 ] 

_ CT 2 a[l ~{l-a) 2t ] 

n(2 — a) 

Hence, when t is large we see that, provided that the process has remained in control 
throughout, 

E[W t ] = n 

a 2 a -,, 

Var(W;) % — since (1 - a) 2t f» 

«(2 — a) 

Thus, the upper and lower control limits for W t are given by 



UCL= M + 3ct„ ,, 

«(2 — a) 

LCL = fi — 3cr I 



»(2 — a) 



Note that the preceding control limits are the same as those in a moving- average control 
chart with span k (after the initial k values) when 



3a 

= 3ct 



Vw£ ' V »(2 - «) 
or, equivalently, when 

2 — a 



■+1 



EXAMPLE 13.6b A repair shop will send a worker to a caller's home to repair electronic 
equipment. Upon receiving a request, it dispatches a worker who is instructed to call in 
when the job is completed. Historical data indicate that the time from when the server 
is dispatched until he or she calls is a normal random variable with mean 62 minutes 
and standard deviation 24 minutes. To keep aware of any changes in this distribution, 
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the repair shop plots a standard exponentially weighted moving- average (EWMA) control 
chart with each data value being the average of 4 successive times, and with a weighting 
factor of a = .25. If the present value of the chart is 60 and the following are the next 16 
subgroup averages, what can we conclude? 

48, 52, 70, 62, 57, 81, 56, 59, 77, 82, 78, 80, 74, 82, 68, 84 

SOLUTION Starting with Wq = 60, the successive values of W\, . . . , Wig can be obtained 
from the formula 



Wt = 25X t + .75W,-i 



This gives 



Wi = (.25X48) + (.75X60) = 57 
W 2 = (.25X52) + (.75X57) = 55.75 
W 5 = (.25X70) + (.75X55.75) = 59.31 
W A = (.25)(62) + (.75X59.31) = 59.98 
W 5 = (.25)07) + (.75X59.98) = 59.24 
W 6 = (.25)(81) + (.75X59.24) = 64.68 

and so on, with the following being the values of Wj through W\(,\ 

62.50, 61.61, 65.48, 69.60, 71.70, 73.78, 73.83, 75.87, 73.90, 76.43 

Since 



, .25 24 

3j = = 13.61 

1.75 VI 

the control limits of the standard EWMA control chart with weighting factor a = .25 are 

LCL = 62- 13.61 =48.39 
UCL = 62+ 13.61 =75.61 

Thus, the EWMA control chart would have declared the system out of control after 
determining Wu (and also after W\^). On the other hand, since a subgroup standard 
deviation is a I \pn = 12, it is interesting that no data value differed from fi = 62 by even 
as much as 2 subgroup standard deviations, and so the standard X-control chart would 
not have declared the system out of control. ■ 
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EXAMPLE 13.6c Consider the data of Example 13.6a but now use an exponentially 
weighted moving-average control chart with a = 2/9. This gives rise to the following 
data set. 



t 


x t 


w t 


t 


x t 


w t 


1 


9.617728 


9.915051 


14 


9.220009 


10.84522 


2 


10.25437 


9.990456 


15 


11.25206 


10.93563 


3 


9.867195 


9.963064 


16 


10.48662 


10.83585 


4 


10.79338 


10.14758 


17 


9.025091 


10.43346 


5 


10.60699 


10.24967 


18 


9.693386 


10.269 


6 


10.48396 


10.30174 


19 


11.45989 


10.53364 


*7 


13.33961 


10.97682 


*20 


12.44213 


10.95775 


8 


9.462969 


10.64041 


*21 


11.18981 


11.00932 


9 


10.14556 


10.53044 


*22 


11.56674 


11.13319 


10 


11.66342 


10.78221 


23 


9.869849 


10.85245 


*11 


11.55484 


10.95391 


*24 


12.11311 


11.13259 


*12 


11.26203 


11.02238 


*25 


11.48656 


11.21125 


*13 


12.31473 


11.30957 









; Out of control. 



14 


Moving-Average Control Chart 




12 


\ 
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8 
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30 



FIGURE 1 3.5 



Since 



UCL = 10.94868 
LCL = 9.051318 



we see that the process could be declared out of control as early as t — 7 
(see Figure 13.5). I 
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13.6.3 Cumulative Sum Control Charts 

The major competitor to the moving- average type of control chart for detecting a small- 
to moderate-sized change in the mean is the cumulative sum (often reduced to cu-sum) 
control chart. 

Suppose, as before, thatXi,X2, . . . represent successive averages of sugroups of size n 
and that when the process is in control these random variables have mean \x and standard 
deviation aly/n. Initially, suppose that we are only interested in determining when an 
increase in the mean value occurs. The (one-sided) cumulative sum control chart for 
detecting an increase in the mean operates as follows: Choose positive constants d and B, 
and let 

Yj = Xj — [i — daly/n, j > 1 

Note that when the process is in control, and so E\Xj\ = /x, 

E[Yj] = -dal-Jn < 

Now, let 

5 = 
Sj + i = max{5y + Yj + i,0}, j > 

The cumulative sum control chart having parameters d and B continually plots Sj, and 
declares that the mean value has increased at the first j such that 

Sj > Balyfn 

To understand the rationale behind this control chart, suppose that we had decided 
to continually plot the sum of all the random variables Y; that have been observed so far. 
That is, suppose we had decided to plot the successive values of Pj, where 



3 = 2> 



which can also be written as 

^0 = 

Pj+^Pj + Yj+u ;>0 

Now, when the system has always been in control, all of the Yj have a negative expected 
value, and thus we would expect their sum to be negative. Hence, if the value of Pj ever 



572 



Chapter 13: Quality Control 



became large — say, greater than Ba/^/n — then this would be strong evidence that the 
process has gone out of control (by having an increase in the mean value of a produced 
item) . The difficulty, however, is that if the system goes out of control only after some 
large time, then the value of Pj at that time will most likely be strongly negative (since up 
to then we would have been summing random variables having a negative mean), and thus 
it would take a long time for its value to exceed Balyfn. Therefore, to keep the sum from 
becoming very negative while the process is in control, the cumulative sum control chart 
employs the simple trick of resetting its value to whenever it becomes negative. That 
is, the quantity Sj is the cumulative sum of all of the Yi up to time j, with the exception 
that any time this sum becomes negative its value is reset to 0. 

EXAMPLE I3.6d Suppose that the mean and standard deviation of a subgroup average are 
fj. = 30 and cr/^/n = 8, respectively, and consider the cumulative sum control chart 
with d = .5,-8 = 5. If the first eight subgroup averages are 

29, 33, 35, 42, 36, U, 43, 45 

then the successive values of Yj = X; — 30 — 4 = Xj — 34 are 

Yi = -5, Y 2 = -1, Y 3 = 1, Y 4 = 8, Y 5 =2,Y 6 = 10, Y 7 = 9, T 8 = 11 

Therefore, 

Si =max{-5,0} = 

5 2 = max{-l,0} = 

53 =max{l,0} = 1 

5 4 = max{9, 0} = 9 

5 5 =max{ll,0} = 11 

5 6 = max{21,0} =21 

5 7 = max{30, 0} = 30 

5 8 =max{4l,0} =41 



Since the control limit is 



Boljn = 5(8) = 40 



the cumulative sum chart would declare that the mean has increased after observing the 
eighth subgroup average. ■ 

To detect either a positive or a negative change in the mean, we employ two one-sided 
cumulative sum charts simultaneously. We begin by noting that a decrease in E\X{\ is 
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equivalent to an increase in E[—Xi\. Hence, we can detect a decrease in the mean value 
of an item by running a one-sided cumulative sum chart on the negatives of the subgroup 
averages. That is, for specified values d and B, not only do we plot the quantities Sj as 
before, but, in addition, we let 

Wj = —Xj — (—fl) — dalsfn = fi — Xj — dal^fn 

and then also plot the values Tj, where 

T = 
T j+i = max{ Tj + Wj+1,0], j > 

The first time that either Sj or Tj exceeds Bcr/^/n, the process is said to be out of control. 
Summing up, the following steps result in a cumulative sum control chart for detecting 
a change in the mean value of a produced item: Choose positive constants d and B; use the 
successive subgroup averages to determine the values of Sj and Tj; declare the process out 
of control the first time that either exceeds Bcr/^/n. Three common choices of the pair of 
values d and B are d = .25, B = 8.00, or d = .50, B = 4.77, or d = \,B = 2.49. Any 
of these choices results in a control rule that has approximately the same false alarm rate as 
does the X-control chart that declares the process out of control the first time a subgroup 
average differs from fi by more than 3u/^/n. As a general rule of thumb, the smaller the 
change in mean that one wants to guard against, the smaller should be the chosen value 
of d. 



Problems 

1. Assume that items produced are supposed to be normally distributed with mean 
35 and standard deviation 3. To monitor this process, subgroups of size 5 are 
sampled. If the following represents the averages of the first 20 subgroups, does it 
appear that the process was in control? 



Subgroup 


No. 


X 


Subgroup No. 


X 


1 




34.0 


6 


32.2 


2 




31.6 


7 


33.0 


3 




30.8 


8 


32.6 


4 




33.0 


9 


33.8 


5 




35.0 


10 


35.8 



(continued) 
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Subgroup 


No. 


X 


Subgroup No. 


X 


11 




35.8 


16 


31.6 


12 




35.8 


17 


33.0 


13 




34.0 


18 


33.2 


14 




35.0 


19 


31.8 


15 




33.8 


20 


35.6 



2. Suppose that a process is in control with fi = 14 and a = 2. An X-control chart 
based on subgroups of size 5 is employed. If a shift in the mean of 2.2 units occurs, 
what is the probability that the next subgroup average will fall outside the control 
limits? On average, how many subgroups will have to be looked at in order to 
detect this shift? 

3. If Y has a chi-square distribution with n — \ degrees of freedom, show that 



E[y/Y] = V2 



r(»/2) 



r[(» - D/2] 



{Hint: Write 



EWY] = / v%2 (y) dy 

Jo *-' 



Jo 



e -yl2 y (n- 


W 2 " 1 dy 


2 (»-l)/2p 


-{n-iy 

2 



f°° e~y' 2 y n 


/2 - ! ^ 


° 2( H-1 ) /2 r 


"(«-D" 

2 



Now make the transformation , 



:y/2.) 



4. Samples of size 5 are taken at regular intervals from a production process, and 
the values of the sample averages and sample standard deviations are calculated. 
Suppose that the sum of the X and S values for the first 25 samples are given by 



E*< 



357.2, 



E* 



4. 



(a) Assuming control, determine the control limits for an X-control chart. 

(b) Suppose that the measurable values of the items produced are supposed to be 
within the limits 14.3 i .45. Assuming that the process remains in control 
with a mean and variance that is approximately equal to the estimates derived, 
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approximately what percentage of the items produced will fall within the 
specification limits? 

5. Determine the revised X- and S-control limits for the data in Example 13.3a. 

6. In Problem 4, determine the control limits for an S-control chart. 

7. The following are X and S values for 20 subgroups of size 5. 



Subgroup 


X 


S 


Subgroup 


X 


S 


Subgroup 


X 


S 


1 


33.8 


5.1 


8 


36.1 


4.1 


15 


35.6 


4.8 


2 


37.2 


5.4 


9 


38.2 


7.3 


16 


36.4 


4.6 


3 


40.4 


6.1 


10 


32.4 


6.6 


17 


37.2 


6.1 


4 


39.3 


5.5 


11 


29.7 


5.1 


18 


31.3 


5.7 


5 


41.1 


5.2 


12 


31.6 


5.3 


19 


33.6 


5.5 


6 


40.4 


4.8 


13 


38.4 


5.8 


20 


36.7 


4.2 


7 


35.0 


5.0 


14 


40.2 


6.4 









(a) Determine trial control limits for an X-control chart. 

(b) Determine trial control limits for an S-control chart. 

(c) Does it appear that the process was in control throughout? 

(d) If your answer in part (c) is no, suggest values for upper and lower control 
limits to be used with succeeding subgroups. 

(e) If each item is supposed to have a value within 35 ± 10, what is your estimate 
of the percentage of items that will fall within this specification? 

8. Control charts for X and S are maintained on the shear strength of spot welds. 
After 30 subgroups of size 4, J^X; = 12,660 and ^ Si = 500. Assume that the 
process is in control. 

(a) What are the X-control limits? 

(b) What are the S-control limits? 

(c) Estimate the standard deviation for the process. 

(d) If the minimum specification for this weld is 400 pounds, what percentage of 
the welds will not meet the minimum specification? 

9. Control charts for X and S are maintained on resistors (in ohms) . The subgroup size 
is 4. The values of X and S are computed for each subgroup. After 20 subgroups, 
Y,Xi = 8,620 and £ S t = 450. 

(a) Compute the values of the limits for the X and S charts. 

(b) Estimate the value of a on the assumption that the process is in statistical 
control. 

(c) If the specification limits are 430 ± 30, what conclusions can you draw 
regarding the ability of the process to produce items within these specifications? 
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(d) If fi is increased by 60, what is the probability of a subgroup average falling 
outside the control limits? 

10. The following data refer to the amounts by which the diameters of ^-inch ball 
bearings differ from 7-inch in units of .001 inches. The subgroup size is n = 5. 



Subgroup 






Data Values 






1 


2.5 


.5 


2.0 


-1.2 


1.4 


2 


.2 


.3 


.5 


1.1 


1.5 


3 


1.5 


1.3 


1.2 


-1.0 


.7 


4 


.2 


.5 


-2.0 


.0 


-1.3 


5 


-.2 


.1 


.3 


-.6 


.5 


6 


1.1 


-.5 


.6 


.5 


.2 


7 


1.1 


-1.0 


-1.2 


1.3 


.1 


8 


.2 


-1.5 


-.5 


1.5 


.3 


9 


-2.0 


-1.5 


1.6 


1.4 


.1 


10 


-.5 


5.2 


-.1 


-1.0 


-1.5 


11 


.1 


1.5 


-.2 


.3 


2.1 


12 


.0 


-2.0 


-.5 


.6 


-.5 


13 


-1.0 


-.5 


-.5 


-1.0 


.2 


14 


.5 


1.3 


-1.2 


-.5 


-2.7 


15 


1.1 


.8 


1.5 


-1.5 


1.2 



(a) Set up trial control limits for X- and S-control charts. 

(b) Does the process appear to have been in control throughout the sampling? 

(c) If the answer to part (b) is no, construct revised control limits. 

11. Samples of n = 6 items are taken from a manufacturing process at regular 
intervals. A normally distributed quality characteristic is measured, and X and 
S values are calculated for each sample. After 50 subgroups have been analyzed, we 
have 

50 50 

J2 X, = 970 and J] $ = 85 

i=\ i=\ 

(a) Compute the control limit for the X- and S-control charts. Assume that all 
points on both charts plot within the control limits. 

(b) If the specification limits are 19 ± 4.0, what are your conclusions regarding 
the ability of the process to produce items conforming to specifications? 

12. The following data present the number of defective bearing and seal assemblies in 
samples of size 100. 
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Sample 


Number of 


Sample 


Number of 


Number 


Defectives 


Number 


Defectives 


1 


5 


11 


4 


2 


2 


12 


10 


3 


1 


13 





4 


5 


14 


8 


5 


9 


15 


3 


6 


4 


16 


6 


7 


3 


17 


2 


8 


3 


18 


1 


9 


2 


19 


6 


10 


5 


20 


10 



Does it appear that the process was in control throughout? If not, determine revised 
control limits if possible. 

13. The following data represent the results of inspecting all personal computers 
produced at a given plant during the last 12 days. 



Day 


Number of Units 


Number Defective 


1 


80 


5 


2 


110 


7 


3 


90 


4 


4 


80 


9 


5 


100 


12 


6 


90 


10 


7 


80 


4 


8 


70 


3 


9 


80 


5 


10 


90 


6 


11 


90 


5 


12 


110 


7 



Does the process appear to have been in control? Determine control limits for 
future production. 

14. Suppose that when a process is in control each item will be defective with probabil- 
ity .04. Suppose that your control chart calls for taking daily samples of size 500. 
What is the probability that, if the probability of a defective item should suddenly 
shift to .08, your control chart would detect this shift on the next sample? 

15. The following data represent the number of defective chips produced on the last 
15 days: 121, 133, 98, 85, 101, 78, 66, 82, 90, 78, 85, 81, 100, 75, 89. Would 
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you conclude that the process has been in control throughout these 1 5 days? What 
control limits would you advise using for future production? 

16. Surface defects have been counted on 25 rectangular steel plates, and the data are 
shown below. Set up a control chart. Does the process producing the plates appear 
to be in statistical control? 





Number of 




Number of 


Plate Numbers 


Defects 


Plate Numbers 


Defects 


1 


2 


14 


10 


2 


3 


15 


2 


3 


4 


16 


2 


4 


3 


17 


6 


5 


1 


18 


5 


6 


2 


19 


4 


7 


5 


20 


6 


8 





21 


3 


9 


2 


22 


7 


10 


5 


23 





11 


1 


24 


2 


12 


7 


25 


4 


13 


8 







17. The following data represent 25 successive subgroup averages and moving- averages 
of span size 5 of these subgroup averages. The data are generated by a process that, 
when in control, produces normally distributed items having mean 30 and variance 
40. The subgroups are of size 4. Would you judge that the process has been in 
control throughout? 



x t 


M t 


x t 


M t 


35.62938 


35.62938 


35.80945 


32.34106 


39.13018 


37.37978 


30.9136 


33.1748 


29.45974 


34.73976 


30.54829 


32.47771 


32.5872 


34.20162 


36.39414 


33.17019 


30.06041 


33.37338 


27.62703 


32.2585 


26.54353 


31.55621 


34.02624 


31.90186 


37.75199 


31.28057 


27.81629 


31.2824 


26.88128 


30.76488 


26.99926 


30.57259 


32.4807 


30.74358 


32.44703 


29.78317 


26.7449 


30.08048 


38.53433 


31.96463 


34.03377 


31.57853 


28.53698 


30.86678 


32.93174 


30.61448 


28.65725 


31.03497 


32.18547 


31.67531 
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18. The data shown below give subgroup averages and moving averages of the values 
from Problem 17. The span of the moving averages is k = 8. When in control the 
subgroup averages are normally distributed with mean 50 and variance 5. What 
can you conclude? 



x t 


M t 


50.79806 


50.79806 


46.21413 


48.50609 


51.85793 


49.62337 


50.27771 


49.78696 


53.81512 


50.59259 


50.67635 


50.60655 


51.39083 


50.71859 


51.65246 


50.83533 


52.15607 


51.00508 


54.57523 


52.05022 


53.08497 


52.2036 


55.02968 


52.79759 


54.25338 


52.85237 


50.48405 


52.82834 


50.34928 


52.69814 


50.86896 


52.6002 


52.03695 


52.58531 


53.23255 


52.41748 


48.12588 


51.79759 


52.23154 


51.44783 



19. Redo Problem 17 by employing an exponential weighted moving average control 
chart with a = -? . 

20. Analyze the data of Problem 18 with an exponential weighted moving- average 
control chart having a = | . 

21. Explain why a moving-average control chart with span size k must use different 
control limits for the first k—l moving averages, whereas an exponentially weighted 
moving-average control chart can use the same control limits throughout. [Hint: 
Argue that Vzx{M t ) decreases in t, whereas Var(W f ) increases, and explain why 
this is relevant.] 

22. Repeat Problem 17, this time using a cumulative sum control chart with 

(a) d = .25, 5 = 8; 

(b) d = .5,B = 4.77. 

23. Repeat Problem 18, this time using a cumulative sum control chart with d = 1 
and B = 2.49. 
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LIFE TESTING 



14.1 INTRODUCTION 

In this chapter, we consider a population of items having lifetimes that are assumed to 
be independent random variables with a common distribution that is specified up to an 
unknown parameter. The problem of interest will be to use whatever data are available to 
estimate this parameter. 

In Section 14.2, we introduce the concept of the hazard (or failure) rate function — a 
useful engineering concept that can be utilized to specify lifetime distributions. In 
Section 14.3, we suppose that the underlying life distribution is exponential and show 
how to obtain estimates (point, interval, and Bayesian) of its mean under a variety of 
sampling plans. In Section 14.4, we develop a test of the hypothesis that two exponen- 
tially distributed populations have a common mean. In Section 14.5, we consider two 
approaches to estimating the parameters of a Weibull distribution. 



14.2 HAZARD RATE FUNCTIONS 

Consider a positive continuous random variable X, that we interpret as being the lifetime 
of some item, having distribution function ^and density^ The hazard rate (sometimes 
called the failure rate) function k(t) of Fis defined by 

fit) 

Kt) = 



1 - F(t) 



To interpret k{t), suppose that the item has survived for t hours and we desire 
the probability that it will not survive for an additional time dt. That is, consider 
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P{X &{t,t + dt) \X> ;}.Now 

P{X 6 {t,t + dt),X > t] 



P{X € {t,t + dt)\X> t) 



P{X > t} 
P{X 6 {t,t + dt)\ 



P{X > t} 
fit) 



1 - F{t) 



-dt 



That is, k(t) represents the conditional probability intensity that an item of age t will fail 
in the next moment. 

Suppose now that the lifetime distribution is exponential. Then, by the memoryless 
property of the exponential distribution it follows that the distribution of remaining life 
for a t-year-old item is the same as for a new item. Hence kit) should be constant, which 
is verified as follows: 



1 - F{t) 
Xe~ kt 



e -Xt 



= k 

Thus, the failure rate function for the exponential distribution is constant. The parameter 
X is often referred to as the rate of the distribution. 

We now show that the failure rate function kit), t > 0, uniquely determines the 
distribution F. To show this, note that by definition 



m f(s) 

Hs)= l-Fis) 




ins) 




1 - Fis) 




= y{-log[l- 
ds 


-Fis)]} 



Integrating both sides of this equation from to t yields 



/' 

Jo 



kis) ds=- log[l - F{t)] + log[l - FiO)] 
= - log[l - Fit)] since F(0) = 
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which implies that 

1 -F{t) = exp|- / \{s)ds\ (14.2.1) 

Hence a distribution function of a positive continuous random variable can be specified 
by giving its hazard rate function. For instance, if a random variable has a linear hazard 
rate function — that is, if 

k(t) = a + bt 

then its distribution function is given by 

F(t) = 1 - e - at - htln 

and differentiation yields that its density is 

f{t) = (a + bt)e- (at+bt2 ' 2 \ t > 

When a = 0, the foregoing is known as the Rayleigh density function. 

EXAMPLE 14.2a One often hears that the death rate of a person that smokes is, at each 
age, twice that of a nonsmoker. What does this mean? Does it mean that a nonsmoker has 
twice the probability of surviving a given number of years as does a smoker of the same 
age? 

SOLUTION If X s (t) denotes the hazard rate of a smoker of age t and X„(t) that of a 
nonsmoker of age t, then the foregoing is equivalent to the statement that 

k s {t) = 2\„{t) 

The probability that an ^4-year-old nonsmoker will survive until age B,A < B, is 

f{^4-year-old nonsmoker reaches age B] 

= PJnonsmoker's lifetime > B \ nonsmoker's lifetime > A} 

^ 1 - iy n (g) 
1 - F noa (A) 

exp I- f Q X n {t)dt\ 

from Equation 14.2.1 



ex P{~/o ^n(t)dt\ 
= exp i — I k„(t)dt \ 



2 
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whereas the corresponding probability for a smoker is, by the same reasoning, 
Pj^l-year-old smoker reaches age B} = exp \— I K(t) dt [ 

= exp \ —2 / k n (t) dt \ 

= exp \— I ^n(t) dt \ 

In other words, of two individuals of the same age, one of whom is a smoker and the 
other a nonsmoker, the probability that the smoker survives to any given age is the square 
(not one-half) of the corresponding probability for a nonsmoker. For instance, if X„(t) = 
1/20, 50 < t < 60, then the probability that a 50-year-old nonsmoker reaches age 60 is 
e~ = .607, whereas the corresponding probability for a smoker is e~ l = .368. ■ 

REMARK ON TERMINOLOGY 

We will say that Xhas failure rate function k(t) when more precisely we mean that the 
distribution function of Xhas failure rate function k(t). 

14.3 THE EXPONENTIAL DISTRIBUTION IN LIFE TESTING 
14.3.1 Simultaneous Testing — Stopping at the nw Failure 

Suppose that we are testing items whose life distribution is exponential with unknown 
mean 9. We put n independent items simultaneously on test and stop the experiment 
when there have been a total of r, r < n, failures. The problem is to then use the observed 
data to estimate the mean 6 . 

The observed data will be the following: 

Data: x\ < X2 < ■ ■ ■ < x r , i\, 22> ■ ■ ■ ,i r (14.3.1) 

with the interpretation that they'th item to fail was item ij and it failed at time Xj. Thus, if 
we let Xi, i = 1, . . . , n denote the lifetime of component /', then the data will be as given 
in Equation 14.3.1 if 

.A/j — X\ , J\-i2 — X2 , . . ■ i-A-i r — X r 

other n — r of the Xj are all greater than x r 
Now the probability density of Xu is 

fx i Xx j )= l -e-^ e , j=h...,r 
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and so, by independence, the joint probability density a£Xh,j = 1, . . . , r is 

7 = 1 

Also, the probability that the other n — r of the X s are all greater than x r is, again using 
independence, 

P{Xj > x r for; ^ h or i 2 ... or i r \ = ^-^e^-r 

Hence, we see that the likelihood of the observed data — call it L(x\,...,x r , 
21, ... , it) — is, for x\ < X2 < ■ • • < x r , 



i->\X\ , . . . , X r , l\ , . . . , i r ) 

= fx h ,x, 2 ,...x, v (xi,..., X r )P{Xj > x r ,j ^ i\, . . . , i r } 



(14.3.2) 



\ g -x x \6 . . . \-x,iet e -*,W\n- 



W- ex P 



i=\ (» - r)x r 



REMARK 

The likelihood in Equation 14.3.2 not only specifies that the first r failures occur at 
times x\ < X2 < ■ ■ • < x r but also that the r items to fail were, in order, i\, z'2, . . . , i r . 
If we only desired the density function of the first r failure times, then since there are 
n(n — 1) ■••(»— (r — 1)) = n\l{n — r)\ possible (ordered) choices of the first r items to 
fail, it follows that the joint density is, for xi < xj < ■ ■ ■ < x r , 



f(xi,x 2 ,...,x r ) = -— exp- 

\n — r)\ v r 



12 Xi 

i=\ _ jn-r) 

e 9 



To obtain the maximum likelihood estimator of 6, we take the logarithm of both sides 
of Equation 14.3.2. This yields 



logZ(xi,. . . ,x r ,i\,. . . ,i r ) = -rlog 



i=\ _ \n - r)x r 
9 9 
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and ; 



— log£(*i,...,* r ,ii,...,i r ) = -- + -p- + p 

Equating to and solving yields that 0, the maximum likelihood estimate, is given by 



J2 Xi + {n — r)x r 



Hence, if we let X^ denote the time at which the z'th failure occurs (Xq) is called the 2th 
order statistic) , then the maximum likelihood estimator of 9 is 



J2X {l) + (n-r)X {r) 

— (14.3.3) 



where t, defined to equal the numerator in Equation 14.3.3, is called the total-time-on-test 
statistic. We call it this since the z'th item to fail functions for a time X(/) (and then fails), 
i — 1, . . . , r, whereas the other n — r items function throughout the test (which lasts for 
a time X( r )). Hence the sum of the times that all the items are on test is equal to r. 

To obtain a confidence interval for 0, we will determine the distribution oft, the total 
time on test. Recalling that X(q is the time of the z'th failure, i — 1, . . . , r, we will start 
by rewriting the expression for T. To write an expression for r, rather than summing the 
total time on test of each of the items, let us ask how much additional time on test was 
generated between each successive failure. That is, let us denote by Yj,i — 1, . . . , r, the 
additional time on test generated between the (z — l)st and z'th failure. Now up to the first 
X(i) time units (as all n items are functioning throughout this interval), the total time on 
test is 

Ti = nX {l) 

Between the first and second failures, there are a total of n — 1 functioning items, and so 

Y 2 = (n - l)(X (2) - X {1) ) 
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In general, we have 



and 



Y 2 = (n - l)(X {2) - X m ) 

Y j = (n-j+l)(X (j) -X (j _ 1) ) 

Y r = („ - r + l)(X {r) - X {r _ l} ) 



The importance of the foregoing representation for x follows from the fact that the 
distributions of the Yj's are easily obtained as follows. Since X^, the time of the first 
failure, is the minimum of n independent exponential lifetimes, each having rate I/O, it 
follows from Proposition 5.6.1 that it is itself exponentially distributed with rate n/0. That 
is, X(i) is exponential with mean 01 n; and so «X(i) is exponential with mean 0. Also, at the 
moment when the first failure occurs, the remaining n — \ functioning items are, by the 
memoryless property of the exponential, as good as new and so each will have an additional 
life that is exponential with mean 0; hence, the additional time until one of them fails is 
exponential with rate (n — 1)19. That is, independent of X^^X^) ~ ^(l) ls exponential 
with mean 6/(n — 1) and so Y 2 = (n — 1){X[ 2 ) — -X(i)) is exponential with mean 6. Indeed, 
continuing this argument leads us to the following conclusion: 

Y\, . . . , Y r are independent exponential 

random variables each having mean 6 (14.3.4) 

Hence, since the sum of independent and identically distributed exponential random 
variables has a gamma distribution (Corollary 5.7.2), we see that 

t ~ gamma(r, I/O) 

That is, t has a gamma distribution with parameters r and I/O. Equivalently, by recalling 
that a gamma random variable with parameters (r, I/O) is equivalent to 0/2 times a 
chi-square random variable with 2r degrees of freedom (see Section 5.8. 1), we obtain that 

% - xl (14.3.5) 
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That is, 2x16 has a chi-square distribution with 2r degrees of freedom. Hence, 

P{xt-al2,2r < 2r/# < Xa/2,2r} = l ~ a 
and so a 100(1 — a) percent confidence interval for 9 is 

/ 2t 2t \ 



12 '2 I 

\^a/2,2r X-l-a/2,2r / 

One-sided confidence intervals can be similarly obtained. 



(14.3.6) 



EXAMPLE 14.3a A sample of 50 transistors is simultaneously put on a test that is to be 
ended when the 15th failure occurs. If the total time on test of all transistors is equal to 
525 hours, determine a 95 percent confidence interval for the mean lifetime of a transistor. 
Assume that the underlying distribution is exponential. 

SOLUTION From Program 5.8.1b, 

X.025,30 = 4 6-98, x 2 97 5, 30 = 16.89 

and so, using Equation 14.3.6, we can assert with 95 percent confidence that 

9 6 (22.35,62.17) ■ 

In testing a hypothesis about 9, we can use Equation 14.3.6 to determine the/>-value 
of the test data. For instance, suppose we are interested in the one-sided test of 

Hq : 9 > 0q 

versus the alternative 

Hi : 9 < $o 

This can be tested by first computing the value of the test statistic 2x/9o — call this 
value v — and then computing the probability that a chi-square random variable with 2r 
degrees of freedom would be as small as v. This probability is the p- value in the sense that 
it represents the (maximal) probability that such a small value of 2xl9o would have been 
observed if Ha were true. The hypothesis should then be rejected at all significance levels 
at least as large as this p- value. 

EXAMPLE 14.3b A producer of batteries claims that the lifetimes of the items it manufac- 
tures are exponentially distributed with a mean life of at least 150 hours. To test this claim, 
100 batteries are simultaneously put on a test that is slated to end when the 20th failure 
occurs. If, at the end of the experiment, the total test time of all the 100 batteries is equal 
to 1,800, should the manufacturer's claim be accepted? 



14.3 The Exponential Distribution in Life Testing 589 



SOLUTION Since 2r/8 = 3,600/150 = 24, the Rvalue is 

/.-value = P{ X 2 A{i < 24} 

= .021 from Program 5.8.1a 

Hence, the manufacturer's claim should be rejected at the 5 percent level of significance 
(indeed at any significance level at least as large as .021). ■ 

It follows from Equation 14.3.5 that the accuracy of the estimator xlr depends only 
on r and not on n, the number of items put on test. The importance of n resides in the 
fact that by choosing it large enough we can ensure that the test is, with high probability, 
of short duration. In fact, the moments of X( r ), the time at which the test ends are easily 
obtained. Since, with JQo) = 0, 

Yj 

X (j)- X (j-D= „_j +l ' J= l >---> r 

it follows upon summing that 



Y- 

x M = E 



n—j + l 

Hence, from Equation 14.3.4, X^ is the sum of r independent exponentials having 
respective means Gin, 9/{n — 1), . . . , 8/(n — r + 1). Using this, we see that 



r 9 " 1 

j = \ J j=n—r+l J 



(14.3.7) 



^ = ±(^j^±J 



where the second equality uses the fact that the variance of an exponential is equal to the 
square of its mean. For large n, we can approximate the preceding sums as follows: 



n — r + 1 



■ , , i Jn-r+1 x \ 

j=n— r+\ J 

Air 2 dx _ i i 

jJ^ r +\J 2 Jn-r+lX 2 n-r+1 n 



r — 1 
n(n — r + 1) 
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Thus, for instance, if in Example 14. 3b the true mean life was 120 hours, then the 
expectation and variance of the length of the test are approximately given by 

, /100\ 
E[X m ]^ 120 log f — J =25.29 



Var(X {20) ) « (120) 2 



19 



100(81) 



33.78 



14.3.2 Sequential Testing 

Suppose now that we have an infinite supply of items, each of whose lifetime is exponential 
with an unknown mean 8, which are to be tested sequentially, in that the first item is put 
on test and on its failure the second is put on test, and so on. That is, as soon as an item 
fails, it is immediately replaced on life test by the next item. We suppose that at some fixed 
time T the text ends. 

The observed data will consist of the following: 



Data: 



r,x\,X2, 



with the interpretation that there has been a total of r failures with the 2th item on test 
having functioned for a time x{. Now the foregoing will be the observed data if 



;=1 



X, = xi, i—l,...,r, /,*» < T 

r 

X r +i > T — 2 ,xj 



(14.3.8) 



where Xi is the functional lifetime of the ith item to be put in use. This follows since 
in order for there to be r failures, the rth failure must occur before time T — and so 
X)»=i Xi < T — and the functional life of the (r + l)st item must exceed T — Yli=i %-i 
(see Figure 14.1). 



-X, 



Time of rth failure 



Time of (r+ 1)st failure 



Time 



FIGURE 1 4. 1 r failures by time T. 
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From Equation 14.3.8, we obtain that the likelihood of the data r,x\,...,x r is as 
follows: 

f{r,x\,. . . ,x r \9) 

= fx 1 ,...,x r (xi,-- -,x r )P \x r+ \ > T - ^xi \ , y^ay < T 
[ i=\ J ,= i 

_ }_ e -T,r =l x,ie e -(T-Y.' t=lXi )ie 

e r 

= -e- TI6 



Therefore, 



and so 



T 

log/(r,xi,. . . ,x r \9) = -r\og9 - — 



d r T 
— lae,f(r,xu . . . ,x,.\9) = 1 — » 

On equating to and solving, we obtain that the maximum likelihood estimate for 9 is 

a T 



Since T is the total time on test of all items, it follows once again that the maximum 
likelihood estimate of the unknown exponential mean is equal to the total time on test 
divided by the number of observed failures in this time. 

If we let N{T) denote the number of failures by time T, then the maximum likelihood 
estimator of 9 is T/N(T). Suppose now that the observed value o£N(T) is N(T) = r. To 
determine a 100(1 — a) percent confidence interval estimate for 9, we will first determine 
the values 9i and 9jj, which are such that 

Pe a W(T) >r}=^, Pe L {N(T) < r} = ^ 

where by Pq {A) we mean that we are computing the probability of the event A under the 
supposition that 9 is the true mean. The 100(1 — a) percent confidence interval estimate 
for 9 is 

9 6 (9 L , 9u) 

To understand why those values of 9 for which either 9 < 9i or 9 > 9jj are not 
included in the confidence interval, note that Pg [N( T) > r) decreases and Pg [N(T) < r) 
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increases in 9 (why?). Hence, 

if 9 < 6 L , then Pg[N{T) < r} < P 0L {N{T) < r} = - 
]£$> Ou , then P e {N(T) > r} < P 6u {N{T) >A = °- 

It remains to determine $i and 6jj. To do so, note first that the event that N(T) > r is 
equivalent to the statement that the rth failure occurs before or at time T. That is, 

N(T) >r<&Xi+---+X r <T 

and so 



Pe{N(T) >r} = P e {X l + ---+X r < T} 
= P{y{r,\l9)< T] 



-P\- 2 xl<T 



= P{xl<2T/9} 

Hence, upon evaluating the foregoing at 6 = 6jj, and using the fact that P{x\ r 
■M-a/2 2?} = a ^> we obtain that 

a f , IT 



and that 



Similarly, we can show that 



2T_ 2 

6 V ~ Xl ~ al2 ' 2r 



1(J =2T/x l _ a / 2 ,2r 



9 L = 2TI X 2 al2>2r 



and thus the 100(1 — a) percent confidence interval estimate for 9 is 

9e(2T/ X 2 /2t2r ,2T/xl al2 ,2r) 
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EXAMPLE 14.3c If a one-at-a-time sequential test yields 10 failures in the fixed time of 
T = 500 hours, then the maximum likelihood estimate of 9 is 500/10 = 50 hours. A 95 
percent confidence interval estimate of 9 is 

6 (l,000/x 2 025i20) l,000/x! 75>20 ) 

Running Program 5.8.1b yields that 

X.025,20 = 34 - 17 ' X.975,20 = 9 " 66 

and so, with 95 percent confidence, 

9 e (29.27, 103.52) ■ 

If we wanted to test the hypothesis 

Ho : 9 = 9 

versus the alternative 

H : 9 £ 9 

then we would first determine the value of Af ( T) . l£N{ T) = r, then the hypothesis would 
be rejected provided either 

Pe {N(T) <r}<^ or P 9o {N(T) > r] < - 

In other words, Hq would be rejected at all significance levels greater than or equal to the 
Rvalue given by 

Rvalue = 2 mm(P eo {N ( T) > r},Pe Q {N{T) < r}) 
Rvalue = 2 mm(Pe {N(T) > r), 1 - P 6o {N(T) > r + 1}) 

= 2nun(p{^<^},l-p{ X J rfl) <f 

The />-value for a one-sided test is similarly obtained. 

The chi-square probabilities in the foregoing can be computed by making use of 
Program 5.8.1a. 

EXAMPLE 14.3d A company claims that the mean lifetimes of the semiconductors it 
produces is at least 25 hours. To substantiate this claim, an independent testing ser- 
vice has decided to sequentially test, one at a time, the company's semiconductors for 
600 hours. If 30 semiconductors failed during this period, what can we say about the 
validity of the company's claim? Test at the 10 percent level. 
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SOLUTION This is a one-sided test of 

H :6>25 versus Hi : d < 25 

The relevant probability for determining the Rvalue is the probability that there would 
have been as many as 30 failures if the mean life were 25. That is, 

/>-value = P 2 ${N (600) > 30} 

= P{xl Q < 1,200/25} 

= .132 from Program 5.8.1a 

Thus, Ho would be accepted when the significance level is .10. ■ 

14.3.3 Simultaneous Testing — Stopping by a Fixed Time 

Suppose again that we are testing items whose life distributions are independent exponential 
random variables with a common unknown mean 6 . As in Section 14.3.1, the n items are 
simultaneously put on test, but now we suppose that the test is to stop either at some fixed 
time T or whenever all n items have failed — whichever occurs first. The problem is to 
use the observed data to estimate 0. 
The observed data will be as follows: 

Data \ i\ , i%, . . . , i r , x\ , X2> • • • > x r 

with the interpretation that the preceding results when the r items numbered i\ , . . . , i r are 
observed to fail at respective times x\,. . . ,x r ; and the other n — r items have not failed by 
time T. 

Since an item will not have failed by time 7" if and only if its lifetime is greater than T, 
we see that the likelihood of the foregoing data is 

f(iy, ..., i r , x\, . . . , x r ) =fx i{ ,...x ir (*l> • ■ • j x r )P{Xj > T,j jLix,..., i r \ 
= I,-*/*... l - e -xrie {e -Tie )n -r 



1 
= — exp ■ 



i=\ (n-r)T 



To obtain the maximum likelihood estimates, take logs to obtain 

^2 x i 

(fl — T)l 

log/Xn,. . .,i r ,x\,. ..,x r ) = -rlogO 



14.3 The Exponential Distribution in Life Testing 595 



Hence, 



J2 x i + (n- r)T 
a . r i 



log/(/'i,. . . ,i r ,x\,. 



, X r 



de~° J v "" e e 2 

Equating to and solving yields that 8, the maximum likelihood estimate, is given by 



J2 Xi + in- r)T 



Hence, if we let R denote the number of items that fail by time T and let X(j) be the 
2th smallest of the failure times, i = 1, . . . ,R, then the maximum likelihood estimator 
of (9 is 

Y,X {i) + {n-R)T 

§ = — 

R 

Let t denote the sum of the times that all items are on life test. Then, because the R 
items that fail are on test for times X(i), . . . ,X(g) whereas the n — R nonfailed items are all 
on test for time T, it follows that 

R 



T = J2 x (>") + ( n -® T 



and thus we can write the maximum likelihood estimator as 

T 

6 = - 

R 

In words, the maximum likelihood estimator of the mean life is (as in the life testing 
procedures of Sections 14.3.1 and 14.3.2) equal to the total time on test divided by the 
number of items observed to fail. 

REMARK 

As the reader may possibly have surmised, it turns out that for all possible life test- 
ing schemes for the exponential distribution, the maximum likelihood estimator of the 
unknown mean 9 will always be equal to the total time on test divided by the number of 
observed failures. To see why this is true, consider any testing situation and suppose that 
the outcome of the data is that r items are observed to fail after having been on test for 
times x\ , . . . , x r , respectively, and that s items have not yet failed when the test ends — at 
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which time they had been on test for respective times y\,... ,y s . The likelihood of this 
outcome will be 



1 



1 



likelihood = K-e-^ le L,-*rie p -y x ie p -y s l6 



. . . — e ' e ■" . . . e 



K 

= 777 ex P 






(14.3.9) 



where K, which is a function of the testing scheme and the data, does not depend on 9. 
(For instance, Kmzy relate to a testing procedure in which the decision as to when to stop 
depends not only on the observed data but is allowed to be random.) It follows from the 
foregoing that the maximum likelihood estimate of 9 will be 



r s 

i=\ i=\ 



(14.3.10) 



But 5Zj=i x i + X«'=iJ* ls J ust tne total-time-on-test statistic and so the maximum like- 
lihood estimator of 9 is indeed the total time on test divided by the number of observed 
failures in that time. 

The distribution of xIR is rather complicated for the life testing scheme described in 
this section* and thus we will not be able to easily derive a confidence interval estimator for 
9. Indeed, we will not further pursue this problem but rather will consider the Bayesian 
approach to estimating 9. 



14.3.4 The Bayesian Approach 

Suppose that items having independent and identically distributed exponential lifetimes 
with an unknown mean 9 are put on life test. Then as noted in the remark given in Section 
14.3.3, the likelihood of the data can be expressed as 



/(data|0) = — e 



-tie 



where t is the total time on test — that is, the sum of the time on test of all items used — 
and r is the number of observed failures for the given data. 

Let X = 1/9 denote the rate of the exponential distribution. In the Bayesian approach, 
it is more convenient to work with the rate k rather than its reciprocal. From the 



* For instance, for the scheme considered, T and R are not only both random but are also dependent. 
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foregoing we see that 

/(data|A) = KX r e~ Xt 

If we suppose prior to testing, that A is distributed according to the prior density g{ A), 
then the posterior density of A. given the observed data is as follows: 

W*> - /(data|%(x) 



ff(data\X)g(X)dX 

>. r e- Xt g(X) 
fX r e- kt g(X)dX 



(14.3.11) 



The preceding posterior density becomes particularly convenient to work with when g is 
a gamma density function with parameters, say, (b, a) — that is, when 

ae- aX {aX) b - 1 

for some nonnegative constants a and b. Indeed for this choice of g we have from 
Equation 14.3.11 that 

/(A|data) = Ce- (a+t)k X r+b - 1 

= Ke- {a+t)l [(a + t)X] b+r ~ l 

where C and K do not depend on X. Because we recognize the preceding as the gamma 
density with parameters (b + r, a + t), we can rewrite it as 

{a + t)e-^ +t H{a + t)xf+'-' 
/(A|data) = ^-^ , A>0 

In other words, if the prior distribution of X is gamma with parameters (b, a), then no 
matter what the testing scheme, the (posterior) conditional distribution of X given the 
data is gamma with parameters {b + R, a + T"), where r and R represent respectively the 
total-time-on-test statistic and the number of observed failures. Because the mean of a 
gamma random variable with parameters (b, a) is equal to bla (see Section 5.7), we can 
conclude that i?[A.|data], the Bayes estimator of A., is 

b + R 
E [X | data] = 

a + x 

EXAMPLE I4.3e Suppose that 20 items having an exponential life distribution with an 
unknown rate A are put on life test at various times. When the test is ended, there have 
been 10 observed failures — their lifetimes being (in hours) 5, 7, 6.2, 8.1, 7.9, 15, 18, 
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3.9, 4.6, 5.8. The 10 items that did not fail had, at the time the test was terminated, 
been on test for times (in hours) 3, 3.2, 4.1, 1.8, 1.6, 2.7, 1.2, 5.4, 10.3, 1.5. If prior 
to the testing it was felt that A. could be viewed as being a gamma random variable with 
parameters (2, 20), what is the Bayes estimator of A.? 

SOLUTION Since 

t = 116.1, R=10 
it follows that the Bayes estimate of A, is 



12 

E[k | data] = = .088 

136.1 



REMARK 

As we have seen, the choice of a gamma prior distribution for the rate of an exponential 
distribution makes the resulting computations quite simple. Whereas, from an applied 
viewpoint, this is not a sufficient rationale, such a choice is often made with one justification 
being that the flexibility in fixing the two parameters of the gamma prior usually enables 
one to reasonably approximate their true prior feelings. 



14.4 A TWO-SAMPLE PROBLEM 

A company has set up two separate plants to produce vacuum tubes. The company 
supposes that tubes produced at Plant I function for an exponentially distributed time 
with an unknown mean 0\ whereas those produced at Plant II function for an exponen- 
tially distributed time with unknown mean &i- To test the hypothesis that there is no 
difference between the two plants (at least in regard to the lifetimes of the tubes they 
produce), the company samples n tubes from Plant I and m from Plant II and then utilizes 
these tubes to determine their lifetimes. How can they thus determine whether the two 
plants are indeed identical? 

If we letXi , . . . , X n denote the lifetimes of the n tubes produced at Plant I and Y\, . . . , Y m 
denote the lifetimes of the m tubes produced at Plant II, then the problem is to test 
the hypothesis that 0\ = 82 when the X;, i = 1, . . . , n are a random sample from an 
exponential distribution with mean 6\ and the Y;, i = 1, . . . , m are a random sample from 
an exponential distribution with mean 62 ■ Moreover, the two samples are supposed to be 
independent. 

To develop a test of the hypothesis that 0\ = 62, let us begin by noting that Yl"=i % 
and X^=i ^i (being the sum of independent and identically distributed exponentials) are 
independent gamma random variables with respective parameters (n, l/9\) and (m, 1/6*2)- 
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Hence, by the equivalence of the gamma and chi-square distribution it follows that 

i=\ 
j m 

i=\ 

Hence, it follows from the definition of the ^-distribution that 

2 n 



In 



7 m 
0~2 



m \ 



1 n,m 



2m 
That is, if X and Y are the two sample means, respectively, then 

QtX 

— = has an ^-distribution with n and m degrees of freedom 

Hence, when the hypothesis 9\ = 02 is true, we see thaxX/Y has an ^-distribution with 
n and m degrees of freedom. This suggests the following test of the hypothesis that 
01=02. 

Test: Hq : 9\ = 02 vs. alternative H\ : 0\ ^ 62 

Step 1: Choose a significance level a. 

Step 2: Determine the value of the test statistic XIY — say its value is v. 

Step 3: Compute P{F < v] where F ~ F n>m . If this probability is either less 
than all (which occurs when X is significantly less than Y) or greater 
than 1 — a/2 (which occurs when X is significantly greater than Y) then 
the hypothesis is rejected. 

In other words, the />-value of the test data is given by 

Rvalue = 2 mm{P{F <v},\~ P{F < v}) 

EXAMPLE 1 4.4a Test the hypothesis, at the 5 percent level of significance, that the lifetimes 
of items produced at two given plants have the same exponential life distribution if a sample 
of size 10 from the first plant has a total lifetime of 420 hours whereas a sample of 15 from 
the second plant has a total lifetime of 510 hours. 
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SOLUTION The value of the test statistic XI Y is 42/34 = 1.2353. To compute the prob- 
ability that an i-'-random variable with parameters 10, 15 is less than this value, we run 
Program 5.8.3a to obtain that 

P{F w ,i5 < 1-2353} = .6554 
Because the/>-value is equal to 2(1 — .6554) = .6892, we cannot reject Hq. ■ 

14.5 THE WEIBULL DISTRIBUTION IN LIFE TESTING 

Whereas the exponential distribution arises as the life distribution when the hazard rate 
function k(t) is assumed to be constant over time, there are many situations in which it is 
more realistic to suppose that k(t) either increases or decreases over time. One example of 
such a hazard rate function is given by 

k(t)=aPt p ~\ t>Q (14.5.1) 

where a and fi are positive constants. The distribution whose hazard rate function is given 
by Equation 14.5.1 is called the Weibull distribution with parameters (<x,fi). Note that 
k(t) increases when fi > 1; decreases when fi < 1; and is constant (reducing to the 
exponential) when fi = 1 . 

The Weibull distribution function is obtained from Equation 14.5.1 as follows: 

F{t) = 1 - exp I - / k{s) ds\, t > 

= 1 - exp{— at"} 

Differentiating yields its density function: 

f{t) = afit^' 1 exp{-at p }, t > (14.5.2) 

This density is plotted for a variety of values of a and fi in Figure 14.2. 

Suppose now that X\,...,X n are independent Weibull random variables each having 
parameters (a, fi), which are assumed unknown. To estimate a and fi, we can employ the 
maximum likelihood approach. Equation 14.5.2 yields the likelihood, given by 

/(*,,.. . ,x„) = a n fi n x[- X ■ • -xf" 1 exp -aJ24 

Hence, 

n n 

log/(*i,. . . ,x„) = nhga + nhgfi + {fi - 1) /Togay - a}x- 
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FIGURE 14.2 Weibull density functions. 



— log/(xi,...,x„) = - 
da a 



i=\ 



9 n 

— log/fa ,...,*») = - 

op P 



+ 2_^ log Xi - a 2_^ x i log Xi 



Equating to zero shows that the maximum likelihood estimates a and ji are the solutions of 



n 
a 



i=\ 



3 + V log Xi = a V xf log Xi 
P ,= \ i=\ 
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or, equivalently, 



n 
a = 



I n \ »P HxflogXi 

» + tH[Yl*i)= l= \ k — 

This latter equation can then be solved numerically for ft, which will then also determine a . 
However, rather than pursuing this approach any further, let us consider a second approach, 
which is not only computationally easier but appears, as indicated by a simulation study, 
to yield more accurate estimates. 

14.5.1 Parameter Estimation by Least Squares 

Let X\, . . . ,X n be a sample from the distribution 



Note that 



F(x) = 1 - e~ ax \ x > 



log(l — F(x)) = —ax'' 



log ( ) = QtAT 

and so 

log log ( _ J = ;61ogx + loga (14.5.3) 

Now let X(i) < Xq) < • ■ • < X(„) denote the ordered sample values — that is, for 
i = 1, n, 

X(i) = ith smallest of X\ , . . . , X n 

and suppose that the data results in Xn) = xy) . If we were able to approximate the quantities 
log log(l/[l — i 7 (x( ; ))]) — say, by the values y\, . . . ,y„ — then from Equation 14.5.3, we 
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could conclude that 

yi & j61og*(,-) + loga, i=\,...,n (14.5.4) 

We could then choose a and fi to minimize the sum of the squared errors — that is, a 
and /3 are chosen to 



minimize > {yi — /3 logX( ; ) — logo;) 

Indeed, using Proposition 9.2.1 we obtain that the preceding minimum is attained when 
a = a, ft = J3 where 



/Jj logX(,) - n logxy 
0=-^ 

r n 

^(logx w ) 2 - «(logx) 2 



log a = y — p log ; 



vhere 



n n 

log* = yfoogxffl) In, y — y^ji/n 

i=\ ' i=\ 

To utilize the foregoing, we need to be able to determine values yi that approximate log 
log(l/[l —F(x(j))] = log[— log(l — F(x(j)))), i = 1, ...,». We now present two different 
methods for doing this. 

Method 1: This method uses the fact that 

E[F{X (l) )] = —J— (14.5.5) 

(n + 1) 

and then approximates F(xt{\) by E[F(X(f))]. Thus, this method calls for using 

yi = log{- log(l - F[F(X {1) )])} (14.5.6) 

= log j - log | 1 

= log \ - lo: 



(» + l) 

n + 
n + 



m 
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Method 2: This method uses the fact that 

E[- log(l - F(X {l) )] =- + — *— + — L- + • • • + l —- (14.5.7) 

« n — \ n — 2 n — i + I 

and then approximates — log(l — F(x(,y)) by the foregoing. Thus, this second method 
calls for setting 



yi = log 



i i i 



n (n — 1) (» — *'+ 1) 



(14.5.8) 



REMARKS 



(a) It is not, at present, clear which method provides superior estimates of the param- 
eters of the Weibull distribution, and extensive simulation studies will be necessary 
to determine this. 

(b) Proofs of equalities 14.5.5 and 14.5.7 [which hold whenever X(j) is the z'th small- 
est of a sample of size n from any continuous distribution F] are outlined in 
Problems 28-30. 



Problems 

1. A random variable whose distribution function is given by 

F{t) = 1 - exp{-at p ], t > 

is said to have a Weibull distribution with parameters a, /3. Compute its failure 
rate function. 

2. If X and Y are independent random variables having failure rate functions X x (t) 
and ky(t), show that the failure rate function of Z = min(X, Y) is 

k z (t) = k x (t) + k y {t) 

3. The lung cancer rate of a £-year-old male smoker, k{t), is such that 

t-A0^ A 



k(t) = .027 + .025 I 1 , t > 40 

Assuming that a 40-year-old male smoker survives all other hazards, what is 
the probability that he survives to (a) age 50, (b) age 60, without contracting 
lung cancer? In the foregoing we are assuming that he remains a smoker throughout 
his life. 
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4. Suppose the life distribution of an item has failure rate function X(t) = t , < 
t < 00. 

(a) What is the probability that the item survives to age 2? 

(b) What is the probability that the item's life is between .4 and 1.4? 

(c) What is the mean life of the item? 

(d) What is the probability a 1-year-old item will survive to age 2? 

5. A continuous life distribution is said to be an IFR (increasing failure rate) 
distribution if its failure rate function X(t) is nondecreasing in t. 

(a) Show that the gamma distribution with density 



f{t) = X 2 te' 



-kt 



t > 



is IFR. 
(b) Show, more generally, that the gamma distribution with parameters a, X is 
IFR whenever a > 1 . 
Hint: Write 

/ - ),(■' / -'(ksY'-'' r/s 

Kt) 



Xe- Xt {Xt) a - 1 



6. Show that the uniform distribution on [a, b) is an IFR distribution. 

7. For the model of Section 14.3.1, explain how the following figure can be used to 



show that 



vhere 



£>i 



r j = (,,-j+W(j l -x (j _„) 



x. , 








{') 










('- 1) 

(r-2) 








































Y 


















(3) 

X (2) 

X (1) 

























































12 3 r-3 r-2 r-1 r n 

{Hint: Argue that both x and 5Z/=i ^j eo , ua l the total area of the figure shown.) 
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8. When 30 transistors were simultaneously put on a life test that was to be terminated 
when the 10th failure occurred, the observed failure times were (in hours) 4.1, 
7.3, 13.2, 18.8, 24.5, 30.8, 38.1, 45.5, 53, 62.2. Assume an exponential life 
distribution. 

(a) What is the maximum likelihood estimate of the mean life of a transistor? 

(b) Compute a 95 percent two-sided confidence interval for the mean life of a 
transistor. 

(c) Determine a value c that we can assert, with 95 percent confidence, is less 
than the mean transistor life. 

(d) Test at the a = . 10 level of significance the hypothesis that the mean lifetime 
is 7.5 hours versus the alternative that it is not 7.5 hours. 

9. Consider a test of Hq : 9 = 9q versus H\ : 9 ^ 9q for the model of Sec- 
tion 14.3.1. Suppose that the observed value of 2xI9q is v. Show that the 
hypothesis should be rejected at significance level a whenever a is less than the 
/>-value given by 

jO-value = 2m'm(P{x2 r < v},\ — P{xir < v )) 

where x\ T ls a chi-square random variable with 2r degrees of freedom. 

10. Suppose 30 items are put on test that is scheduled to stop when the 8th failure 
occurs. If the failure times are, in hours, .35, .73, .99, 1.40, 1.45, 1.83, 2.20, 
2.72, test, at the 5 percent level of significance, the hypothesis that the mean life 
is equal to 10 hours. Assume that the underlying distribution is exponential. 

11. Suppose that 20 items are to be put on test that is to be terminated when the 
10th failure occurs. If the lifetime distribution is exponential with mean 10 hours, 
compute the following quantities. 

(a) The mean length of the testing period. 

(b) The variance of the testing period. 

12. Vacuum tubes produced at a certain plant are assumed to have an underlying 
exponential life distribution having an unknown mean 9. To estimate 9 it has 
been decided to put a certain number n of tubes on test and to stop the test at 
the 10th failure. If the plant officials want the mean length of the testing period 
to be 3 hours when the value of 9 is 9 = 20, approximately how large should 
n be? 

13. A one-at-a-time sequential life testing scheme is scheduled to run for 300 hours. 
A total of 16 items fail within that time. Assuming an exponential life distribution 
with unknown mean 9 (measured in hours): 

(a) Determine the maximum likelihood estimate of 9. 
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(b) Test at the .05 level of significance the hypothesis that 9 = 20 versus the 
alternative that 9 ^ 20. 

(c) Determine a 95 percent confidence interval for 9. 

14. Using the fact that a Poisson process results when the times between successive 
events are independent and identically distributed exponential random variables, 
show that 

P{X >n}= F v 2 (x) 

when X is a Poisson random variable with mean x/2 and F i is the chi-square 
distribution function with In degrees of freedom. {Hint: Use the results of 
Section 14.3.2.) 

15. From a sample of items having an exponential life distribution with unknown 
mean 9, items are tested in sequence. The testing continues until either the rth 
failure occurs or after a time T elapses. 

(a) Determine the likelihood function. 

(b) Verify that the maximum likelihood estimator of 9 is equal to the total time 
on test of all items divided by the number of observed failures. 

16. Verify that the maximum likelihood estimate corresponding to Equation 14.3.9 
is given by Equation 14.3.10. 

17. A testing laboratory has facilities to simultaneously life test 5 components. The 
lab tested a sample of 10 components from a common exponential distribution 
by initially putting 5 on test and then replacing any failed component by one still 
waiting to be tested. The test was designed to end either at 200 hours or when all 
10 components had failed. If there were a total of 9 failures occurring at times 15, 
28.2, 46, 62.2, 76, 86, 128, 153, 197, what is the maximum likelihood estimate 
of the mean life of a component? 

18. Suppose that the remission time, in weeks, of leukemia patients that have under- 
gone a certain type of chemotherapy treatment is an exponential random variable 
having an unknown mean 9. A group of 20 such patients is being monitored and, 
at present, their remission times are (in weeks) 1.2, 1.8*, 2.2, 4.1, 5.6, 8.4, 11.8*, 
13.4*, 16.2, 21.7, 29*, 41, 42*, 42.4*, 49.3, 60.5, 61*, 94, 98, 99.2* where 
an asterisk next to the data means that the patient's remission is still continuing, 
whereas a data point without an asterisk means that the remission ended at that 
time. What is the maximum likelihood estimate of 91 

19. In Problem 17, suppose that prior to the testing phase and based on past experi- 
ence one felt that the value of X = 119 could be thought of as the outcome of a 
gamma random variable with parameters 1, 100. What is the Bayes estimate of A? 
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20. What is the Bayes estimate of A = 119 in Problem 18 if the prior distribution on 
X is exponential with mean 1/30? 

21. The following data represent failure times, in minutes, for two types of electrical 
insulation subject to a certain voltage stress. 



Type I 



Type II 



212, 88.5, 122.3, 116.4, 125, 132, 66 



34.6, 54, 162, 49, 78, 121, 128 



Test the hypothesis that the two sets of data come from the same exponential 
distribution. 

22. Suppose that the life distributions of two types of transistors are both exponential. 
To test the equality of means of these two distributions, n\ type 1 transistors are 
simultaneously put on a life test that is scheduled to end when there have been 
a total of ri failures. Similarly, «2 type 2 transistors are simultaneously put on a life 
test that is to end when there have been rx failures. 

(a) Using results from Section 14.3.1, show how the hypothesis that the means 
are equal can be tested by using a test statistic that, when the means are equal, 
has an ^-distribution with 2r\ and 2r2 degrees of freedom. 

(b) Suppose n\ = 20, r\ = 10 and »2 = 10, rj = 7 with the following data 
resulting. 

Type 1 failures at times: 

10.4, 23.2, 31.4, 45, 61.1, 69.6, 81.3, 95.2, 112, 129.4 
Type 2 failures at times: 

6.1, 13.8, 21.2, 31.6, 46.4, 66.7, 92.4 

What is the smallest significance level a for which the hypothesis of equal 
means would be rejected? (That is, what is the/>-value of the test data?) 

23. If X is a Weibull random variable with parameters (a, /?), show that 

E[X] = a -1 ^r(l + IIP) 

where T{y) is the gamma function defined by 

POO 

T(y)= / e~ x X y- 1 dx 
Jo 

Hint: Write 

POO 

E[X] = / ta^~ x exp{-a/} dt 

Jo 
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and make the change of variables 

x = ar , dx = afit^~ x dt 

24. Show that if X is a Weibull random variable with parameters (a, ft), then 
VarpO = a~ 2lp 



2\ / / l x ^ : 



r i + - - r 1 + 



25. If the following are the sample data from a Weibull population having unknown 
parameters a and ft, determine the least square estimates of these quantities, using 
either of the methods presented. 

Data: 15 .4, 16.8, 6.2, 10.6, 21.4, 18.2, 1.6, 12.5, 19.4, 17 

26. Show that if X is a Weibull random variable with parameters (a, /3), then aX" is 
an exponential random variable with mean 1 . 

27. If U is uniformly distributed on (0, 1) — that is, U is a random number — 
show that [— (l/a)\ogU] " is a Weibull random variable with parameters 
(«,£)• 

The next three problems are concerned with verifying Equations 14.5.5 and 
14.5.7. 

28. If X is a continuous random variable having distribution function F, show that 

(a) F(X) is uniformly distributed on (0, 1); 

(b) 1 — F(X) is uniformly distributed on (0, 1). 

29. Let Xin denote ith smallest of a sample of size n from a continuous distribution 
function F . Also, let [/(/) denote the ith smallest from a sample of size n from a 
uniform (0, 1) distribution. 

(a) Argue that the density function of U(j) is given by 

i 

fu U) it) = - ( ^—^t'-\l-t) n -\ 0<t<\ 

w \n — i)\\i — \)\ 

[Hint: In order for the ith smallest of n uniform (0, 1) random variables to 
equal t, how many must be less than t and how many must be greater? Also, 
in how many ways can a set of n elements be broken into three subsets of 
respective sizes i — 1,1, and n — i?] 
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(b) Use part (a) to show that E[U(i)] = i/(n + 1). [Hint: To evaluate the 
resulting integral, use the fact that the density in part (a) must integrate to 1 .] 

(c) Use part (b) and Problem 28a to conclude that E[F(X(i))]=i/(n + 1). 

30. If U is uniformly distributed on (0, 1), show that — log U has an exponen- 
tial distribution with mean 1. Now use Equation 14.3.7 and the results of the 
previous problems to establish Equation 14.5.7. 
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TABLE A 


1 Standard Normal Distribution Function: $ (x) ■■ 


1 

•Jin *> 


f ^ 

— 00 


12 dy 






X 


.00 


.01 


.02 


.03 


.04 


.05 


.06 


.07 


.08 


.09 


.0 


.5000 


.5040 


.5080 


.5120 


.5160 


.5199 


.5239 


.5279 


.5319 


.5359 


.1 


.5398 


.5438 


.5478 


.5517 


.5557 


.5596 


.5636 


.5675 


.5714 


.5753 


.2 


.5793 


.5832 


.5871 


.5910 


.5948 


.5987 


.6026 


.6064 


.6103 


.6141 


.3 


.6179 


.6217 


.6255 


.6293 


.6331 


.6368 


.6406 


.6443 


.6480 


.6517 


.4 


.6554 


.6591 


.6628 


.6664 


.6700 


.6736 


.6772 


.6808 


.6844 


.6879 


.5 


.6915 


.6950 


.6985 


.7019 


.7054 


.7088 


.7123 


.7157 


.7190 


.7224 


.6 


.7257 


.7291 


.7324 


.7357 


.7389 


.7422 


.7454 


.7486 


.7517 


.7549 


.7 


.7580 


.7611 


.7642 


.7673 


.7704 


.7734 


.7764 


.7794 


.7823 


.7852 


.8 


.7881 


.7910 


.7939 


.7967 


.7995 


.8023 


.8051 


.8078 


.8106 


.8133 


.9 


.8159 


.8186 


.8212 


.8238 


.8264 


.8289 


.8315 


.8340 


.8365 


.8389 


1.0 


.8413 


.8438 


.8461 


.8485 


.8508 


.8531 


.8554 


.8577 


.8599 


.8621 


1.1 


.8643 


.8665 


.8686 


.8708 


.8729 


.8749 


.8770 


.8790 


.8810 


.8830 


1.2 


.8849 


.8869 


.8888 


.8907 


.8925 


.8944 


.8962 


.8980 


.8997 


.9015 


1.3 


.9032 


.9049 


.9066 


.9082 


.9099 


.9115 


.9131 


.9147 


.9162 


.9177 


1.4 


.9192 


.9207 


.9222 


.9236 


.9251 


.9265 


.9279 


.9292 


.9306 


.9319 


1.5 


.9332 


.9345 


.9357 


.9370 


.9382 


.9394 


.9406 


.9418 


.9429 


.9441 


1.6 


.9452 


.9463 


.9474 


.9484 


.9495 


.9505 


.9515 


.9525 


.9535 


.9545 


1.7 


.9554 


.9564 


.9573 


.9582 


.9591 


.9599 


.9608 


.9616 


.9625 


.9633 


1.8 


.9641 


.9649 


.9656 


.9664 


.9671 


.9678 


.9686 


.9693 


.9699 


.9706 


1.9 


.9713 


.9719 


.9726 


.9732 


.9738 


.9744 


.9750 


.9756 


.9761 


.9767 


2.0 


.9772 


.9778 


.9783 


.9788 


.9793 


.9798 


.9803 


.9808 


.9812 


.9817 


2.1 


.9821 


.9826 


.9830 


.9834 


.9838 


.9842 


.9846 


.9850 


.9854 


.9857 


2.2 


.9861 


.9864 


.9868 


.9871 


.9875 


.9878 


.9881 


.9884 


.9887 


.9890 


2.3 


.9893 


.9896 


.9898 


.9901 


.9904 


.9906 


.9909 


.9911 


.9913 


.9916 


2.4 


.9918 


.9920 


.9922 


.9925 


.9927 


.9929 


.9931 


.9932 


.9934 


.9936 


2.5 


.9938 


.9940 


.9941 


.9943 


.9945 


.9946 


.9948 


.9949 


.9951 


.9952 


2.6 


.9953 


.9955 


.9956 


.9957 


.9959 


.9960 


.9961 


.9962 


.9963 


.9964 


2.7 


.9965 


.9966 


.9967 


.9968 


.9969 


.9970 


.9971 


.9972 


.9973 


.9974 


2.8 


.9974 


.9975 


.9976 


.9977 


.9977 


.9978 


.9979 


.9979 


.9980 


.9981 


2.9 


.9981 


.9982 


.9982 


.9983 


.9984 


.9984 


.9985 


.9985 


.9986 


.9986 


3.0 


.9987 


.9987 


.9987 


.9988 


.9988 


.9989 


.9989 


.9989 


.9990 


.9990 


3.1 


.9990 


.9991 


.9991 


.9991 


.9992 


.9992 


.9992 


.9992 


.9993 


.9993 


5.2 


.9993 


.9993 


.9994 


.9994 


.9994 


.9994 


.9994 


.9995 


.9995 


.9995 


3.3 


.9995 


.9995 


.9995 


.9996 


.9996 


.9996 


.9996 


.9996 


.9996 


.9997 


3.4 


.9997 


.9997 


.9997 


.9997 


.9997 


.9997 


.9997 


.9997 


.9997 


.9998 
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TABLE A2 Values of x%„ 



n 


a = .995 


a = .99 


a = .975 


a = .95 


a = .05 


a = .025 


a = .01 


a = .005 


1 


.0000393 


.000157 


.000982 


.00393 


3.841 


5.024 


6.635 


7.879 


2 


.0100 


.0201 


.0506 


.103 


5.991 


7.378 


9.210 


10.597 


3 


.0717 


.115 


.216 


.352 


7.815 


9.348 


11.345 


12.838 


4 


.207 


.297 


.484 


.711 


9.488 


11.143 


13.277 


14.860 


5 


.412 


.554 


.831 


1.145 


11.070 


12.832 


13.086 


16.750 


6 


.676 


.872 


1.237 


1.635 


12.592 


14.449 


16.812 


18.548 


7 


.989 


1.239 


1.690 


2.167 


14.067 


16.013 


18.475 


20.278 


8 


1.344 


1.646 


2.180 


2.733 


15.507 


17.535 


20.090 


21.955 


9 


1.735 


2.088 


2.700 


3.325 


16.919 


19.023 


21.666 


23.589 


10 


2.156 


2.558 


3.247 


3.940 


18.307 


20.483 


23.209 


25.188 


11 


2.603 


3.053 


3.816 


4.575 


19.675 


21.920 


24.725 


26.757 


12 


3.074 


3.571 


4.404 


5.226 


21.026 


23.337 


26.217 


28.300 


13 


3.565 


4.107 


5.009 


5.892 


22.362 


24.736 


27.688 


29.819 


14 


4.075 


4.660 


5.629 


6.571 


23.685 


26.119 


29.141 


31.319 


15 


4.601 


5.229 


6.262 


7.261 


24.996 


27.488 


30.578 


32.801 


16 


5.142 


5.812 


6.908 


7.962 


26.296 


28.845 


32.000 


34.267 


17 


5.697 


6.408 


7.564 


8.672 


27.587 


30.191 


33.409 


35.718 


18 


6.265 


7.015 


8.231 


9.390 


28.869 


31.526 


34.805 


37.156 


19 


6.844 


7.633 


8.907 


10.117 


30.144 


32.852 


36.191 


38.582 


20 


7.434 


8.260 


9.591 


10.851 


31.410 


34.170 


37.566 


39.997 


21 


8.034 


8.897 


10.283 


11.591 


32.671 


35.479 


38.932 


41.401 


22 


8.643 


9.542 


10.982 


12.338 


33.924 


36.781 


40.289 


42.796 


23 


9.260 


10.196 


11.689 


13.091 


35.172 


38.076 


41.638 


44.181 


24 


9.886 


10.856 


12.401 


13.484 


36.415 


39.364 


42.980 


45.558 


25 


10.520 


11.524 


13.120 


14.611 


37.652 


40.646 


44.314 


46.928 


26 


11.160 


12.198 


13.844 


15.379 


38.885 


41.923 


45.642 


48.290 


27 


11.808 


12.879 


14.573 


16.151 


40.113 


43.194 


46.963 


49.645 


28 


12.461 


13.565 


15.308 


16.928 


41.337 


44.461 


48.278 


50.993 


29 


13.121 


14.256 


16.047 


17.708 


42.557 


45.772 


49.588 


52.336 


30 


13.787 


14.953 


16.791 


18.493 


43.773 


46.979 


50.892 


53.672 



Other Chi-Square Probabilities: 

x 2 9;9 - 4.2 P{x* 6 < 14.3} - .425 P{xf } < 17.1875} - .8976. 
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TABLE A3 Values oft a 



n 


a = .10 


a = .05 


a = .025 


a = .01 


a = .005 


1 


3.078 


6.314 


12.706 


31.821 


63.657 


2 


1.886 


2.920 


4.303 


6.965 


9.925 


3 


1.638 


2.353 


3.182 


4.541 


5.841 


4 


1.533 


2.132 


2.776 


3.474 


4.604 


5 


1.476 


2.015 


2.571 


3.365 


4.032 


6 


1.440 


1.943 


2.447 


3.143 


3.707 


7 


1.415 


1.895 


2.365 


2.998 


3.499 


8 


1.397 


1.860 


2.306 


2.896 


3.355 


9 


1.383 


1.833 


2.262 


2.821 


3.250 


10 


1.372 


1.812 


2.228 


2.764 


3.169 


11 


1.363 


1.796 


2.201 


2.718 


3.106 


12 


1.356 


1.782 


2.179 


2.681 


3.055 


13 


1.350 


1.771 


2.160 


2.650 


3.012 


14 


1.345 


1.761 


2.145 


2.624 


2.977 


15 


1.341 


1.753 


2.131 


2.602 


2.947 


16 


1.337 


1.746 


2.120 


2.583 


2.921 


17 


1.333 


1.740 


2.110 


2.567 


2.898 


18 


1.330 


1.734 


2.101 


2.552 


2.878 


19 


1.328 


1.729 


2.093 


2.539 


2.861 


20 


1.325 


1.725 


2.086 


2.528 


2.845 


21 


1.323 


1.721 


2.080 


2.518 


2.831 


22 


1.321 


1.717 


2.074 


2.508 


2.819 


23 


1.319 


1.714 


2.069 


2.500 


2.807 


24 


1.318 


1.711 


2.064 


2.492 


2.797 


25 


1.316 


1.708 


2.060 


2.485 


2.787 


26 


1.315 


1.706 


2.056 


2.479 


2.779 


27 


1.314 


1.703 


2.052 


2.473 


2.771 


28 


1.313 


1.701 


2.048 


2.467 


2.763 


29 


1.311 


1.699 


2.045 


2.462 


2.756 


oo 


1.282 


1.645 


1.960 


2.326 


2.576 


Other t Probabilities: 










pin 


< 2.541} = .9825 


P{Tg < 2.7} = .9864 


P(T n < .7635} = 


.77 P(T n < .934} = 


.81 PITu < 


1.66) : 


= .94 P{Tu<2. 


8} = .984. 
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TABLE A4 Values ofF^ in> , 



m = Degrees 

of Freedom 

for 




n = 


Degrees of Freedom 
for Numerator 




Denominator 


1 


2 


3 


4 


5 


1 


161 


200 


216 


225 


230 


2 


18.50 


19.00 


19.20 


19.20 


19.30 


3 


10.10 


9.55 


9.28 


9.12 


9.01 


4 


7.71 


6.94 


6.59 


6.39 


6.26 


5 


6.61 


5.79 


5.41 


5.19 


5.05 


6 


5.99 


5.14 


4.76 


4.53 


4.39 


7 


5.59 


4.74 


4.35 


4.12 


3.97 


8 


5.32 


4.46 


4.07 


3.84 


3.69 


9 


5.12 


4.26 


3.86 


3.63 


3.48 


10 


4.96 


4.10 


3.71 


3.48 


3.33 


11 


4.84 


3.98 


3.59 


3.36 


3.20 


12 


4.75 


3.89 


3.49 


3.26 


3.11 


13 


4.67 


3.81 


3.41 


3.18 


3.03 


14 


4.60 


3.74 


3.34 


3.11 


2.96 


15 


4.54 


3.68 


3.29 


3.06 


2.90 


16 


4.49 


3.63 


3.24 


3.01 


2.85 


17 


3.45 


3.59 


3.20 


2.96 


2.81 


18 


4.41 


3.55 


3.16 


2.93 


2.77 


19 


4.38 


3.52 


3.13 


2.90 


2.74 


20 


4.35 


3.49 


3.10 


2.87 


2.71 


21 


4.32 


3.47 


3.07 


2.84 


2.68 


22 


4.30 


3.44 


3.05 


2.82 


2.66 


23 


4.28 


3.42 


3.03 


2.80 


2.64 


24 


4.26 


3.40 


3.01 


2.78 


2.62 


25 


4.24 


3.39 


2.99 


2.76 


2.60 


30 


4.17 


3.32 


2.92 


2.69 


2.53 


40 


4.08 


3.23 


2.84 


2.61 


2.45 


60 


4.00 


3.15 


2.76 


2.53 


2.37 


120 


3.92 


3.07 


2.68 


2.45 


2.29 


00 


3.84 


3.00 


2.60 


2.37 


2.21 



Other F Probabilities: 

F X7t5 = .337 P{Fjj < 1.376} - .316 P{F 20 ,\4 < 2.461} - .911 P{F %4 < .5} - .1782. 
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TABLE A5 Values ofC(m, d, a) 



m 



d 


a 


2 


3 


4 


5 


6 


7 


8 


9 


10 


11 


5 


.05 


3.64 


4.60 


5.22 


5.67 


6.03 


6.33 


6.58 


6.80 


6.99 


7.17 




.01 


5.70 


6.98 


7.80 


8.42 


8.91 


9.32 


9.67 


9.97 


10.24 


10.48 


6 


.05 


3.46 


4.34 


4.90 


5.30 


5.63 


5.90 


6.12 


6.32 


6.49 


6.65 




.01 


5.24 


6.33 


7.03 


7.56 


7.97 


8.32 


8.61 


8.87 


9.10 


9.30 


7 


.05 


3.34 


4.16 


4.68 


5.06 


5.36 


5.61 


5.82 


6.00 


6.16 


6.30 




.01 


4.95 


5.92 


6.54 


7.01 


7.37 


7.68 


7.94 


8.17 


8.37 


8.55 


8 


.05 


3.26 


4.04 


4.53 


4.89 


5.17 


5.40 


5.60 


5.77 


5.92 


6.05 




.01 


4.75 


5.64 


6.20 


6.62 


6.96 


7.24 


7.47 


7.68 


7.86 


8.03 


9 


.05 


3.20 


3.95 


4.41 


4.76 


5.02 


5.24 


5.43 


5.59 


5.74 


5.87 




.01 


4.60 


5.43 


5.96 


6.35 


6.66 


6.91 


7.13 


7.33 


7.49 


7.65 


10 


.05 


3.15 


3.88 


4.33 


4.65 


4.91 


5.12 


5.30 


5.46 


5.60 


5.72 




.01 


4.48 


5.27 


5.77 


6.14 


6.43 


6.67 


6.87 


7.05 


7.21 


7.36 


11 


.05 


3.11 


3.82 


4.26 


4.57 


4.82 


5.03 


5.20 


5.35 


5.49 


5.61 




.01 


4.39 


5.15 


5.62 


5.97 


6.25 


6.48 


6.67 


6.84 


6.99 


7.13 


12 


.05 


3.08 


3.77 


4.20 


4.51 


4.75 


4.95 


5.12 


5.27 


5.39 


5.51 




.01 


4.32 


5.05 


5.50 


5.84 


6.10 


6.32 


6.51 


6.67 


6.81 


6.94 


13 


.05 


3.06 


3.73 


4.15 


4.45 


4.69 


4.88 


5.05 


5.19 


5.32 


5.43 




.01 


4.26 


4.96 


5.40 


5.73 


5.98 


6.19 


6.37 


6.53 


6.67 


6.79 


14 


.05 


3.03 


3.70 


4.11 


4.41 


4.64 


4.83 


4.99 


5.13 


5.25 


5.36 




.01 


4.21 


4.89 


5.32 


5.63 


5.88 


6.08 


6.26 


6.41 


6.54 


6.66 


15 


.05 


3.01 


3.67 


4.08 


4.37 


4.59 


4.78 


4.94 


5.08 


5.20 


5.31 




.01 


4.17 


4.84 


5.25 


5.56 


5.80 


5.99 


6.16 


6.31 


6.44 


6.55 


16 


.05 


3.00 


3.65 


4.05 


4.33 


4.56 


4.74 


4.90 


5.03 


5.15 


5.26 




.01 


4.13 


4.79 


5.19 


5.49 


5.72 


5.92 


6.08 


6.22 


6.35 


6.46 


17 


.05 


2.98 


3.63 


4.02 


4.30 


4.52 


4.70 


4.86 


4.99 


5.11 


5.21 




.01 


4.10 


4.74 


5.14 


5.43 


5.66 


5.85 


6.01 


6.15 


6.27 


6.38 


18 


.05 


2.97 


3.61 


4.00 


4.28 


4.49 


4.67 


4.82 


4.96 


5.07 


5.17 




.01 


4.07 


4.70 


5.09 


5.38 


5.60 


5.79 


5.94 


6.08 


6.20 


6.31 


19 


.05 


2.96 


3.59 


3.98 


4.25 


4.47 


4.65 


4.79 


4.92 


5.04 


5.14 




.01 


4.05 


4.67 


5.05 


5.33 


5.55 


5.73 


5.89 


6.02 


6.14 


6.25 


20 


.05 


2.95 


3.58 


3.96 


4.23 


4.45 


4.62 


4.77 


4.90 


5.01 


5.11 




.01 


4.02 


4.64 


5.02 


5.29 


5.51 


5.69 


5.84 


5.97 


6.09 


6.19 


24 


.05 


2.92 


3.53 


3.90 


4.17 


4.37 


4.54 


4.68 


4.81 


4.92 


5.01 




.01 


3.96 


4.55 


4.91 


5.17 


5.37 


5.54 


5.69 


5.81 


5.92 


6.02 


30 


.05 


2.89 


3.49 


3.85 


4.10 


4.30 


4.46 


4.60 


4.72 


4.82 


4.92 




.01 


3.89 


4.45 


4.80 


5.05 


5.24 


5.40 


5.54 


5.65 


5.76 


5.85 


40 


.05 


2.86 


3.44 


3.79 


4.04 


4.23 


4.39 


4.52 


4.63 


4.73 


4.82 




.01 


3.82 


4.37 


4.70 


4.93 


5.11 


5.26 


5.39 


5.50 


5.60 


5.69 


60 


.05 


2.83 


3.40 


3.74 


3.98 


4.16 


4.31 


4.44 


4.55 


4.65 


4.73 




.01 


3.76 


4.28 


4.59 


4.82 


4.99 


5.13 


5.25 


5.36 


5.45 


5.53 


120 


.05 


2.80 


3.36 


3.68 


3.92 


4.10 


4.24 


4.36 


4.47 


4.56 


4.64 




.01 


3.70 


4.20 


4.50 


4.71 


4.87 


5.01 


5.12 


5.21 


5.30 


5.37 


00 


.05 


2.77 


3.31 


3.63 


3.86 


4.03 


4.17 


4.29 


4.39 


4.47 


4.55 




.01 


3.64 


4.12 


4.40 


4.60 


4.76 


4.88 


4.99 


5.08 


5.16 


5.23 




Index 



Analysis of variance (ANOVA) 
applications, 439-440 
multiple comparison constants, 
616 

one-way, 440 

between samples sum of squares, 

445-446, 453 
multiple comparisons of sample means, 

450-452 
null hypothesis, 442, 445-446 
sum of squares identity, 447-450 
unequal sample sizes, 452-453 
within samples sum of squares, 443-445, 
452-453 
overview, 440-442 
two-way, 440, 454-457 

column sum of squares, 461 
error sum of squares, 459, 465 
grand mean, 456, 463 
hypothesis testing, 458-462 
null hypothesis, 464, 467-468 
parameter estimation, 458-459 
row and column interaction, 

463-468 
row sum of squares, 460—461 
Approximately normal data set, 31 
Assignable cause, 545 
Attribute, quality control, 545 
Axioms of probability, 59—61 



B 

Bar graph, 1 

Basic principle of counting 

generalized, 63 

proof, 62-63 
Bayes, T., 75 
Bayes estimator, 272-274 

Bernoulli random variables, 273—274 

normal mean, 274—275 

life testing, 596-598 
Bayes formula, 70-76 
Behrens-Fisher problem, 319 
Bernoulli, J., 5, 141 
Bernoulli random variable, 141, 157-158 

approximate confidence interval for mean of 
distribution, 260-264 

Bayes estimator, 273-274 

testing equality of parameters in two Bernoulli 
populations, 327-329 
Beta distribution, 274 

Between samples sum of squares, 445—446, 453 
Bias of an estimator, 267, 272 
Bimodal data set, 33-34 
Binomial distribution function, 147-148 
Binomial random variable 

definition, 141 

hypergeometric random variable relationship, 
159-160,219-220 

probability calculations, 142-146 

probability mass function, 142 
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Binomial random variable (continued ) 

testing equality of parameters in two Bernoulli 
populations, 327-329 
Binomial theorem, 142 
Box plot, 27 



Central limit theorem, 204-206, 209-210 
approximate distribution of sample mean, 

210-212 
sample size requirements, 212—213 
Chance variation, 545 
Chebyshev's inequality 
for data sets 

definition, 9, 27-28 

one-sided Chebyshev inequality, 

29-30 
proof, 28 
probabilities, 127-129 
Chi-square distribution 
definition, 185-186 
probabilities for random variables, 613 
relation between chi-square and gamma 

random variables, 187-189 
sum of squares of residuals in linear regression, 

358-359 
summation of random variables, 216 
Class interval, 14-15 
Coefficient of determination, 376-378 
Coefficient of multiple determination, 405 
Column sum of squares, 461 
Combinations of objects, 65 
Composite hypothesis, 292 
Conditional distribution, 105-107 
Conditional probability, 67-70 
Confidence interval estimators, 241-242 
Bernoulli mean, 260-264 
difference of two normal means, 

253-260 
exponential mean, 265—266 
interpretation, 245-246 
normal mean with unknown variance, 

246-249,251 
one-sided lower, 242-243, 249 
one-sided upper, 242-243, 248 



regression parameters 

a, 370 

b, 365-366 

mean response, 372-373 
two-sided confidence interval, 242, 

244-245 
variances of a normal distribution, 
251-253 
Contingency tables 

with fixed marginal totals, 499-504 
tests of independence, 495-499 
Continuous random variable, 91, 93 
Control charts 

cumulative sum charts, 571-573 
estimation of mean and variance, 

549-551 
exponentially weighted moving- average charts, 

565-570 
for fraction defective, 557-559 
lower control limit, 547-548, 552-553, 

555-559, 562 
moving-average charts, 563-565 
for number of defects, 559-562 
S-charts, 554 
upper control limit, 547-548, 552-553, 

555-559, 562 
X-charts, 546-554 
Correlation coefficient, see Sample correlation 

coefficient 
Covariance 

definition, 121—122 
multiple linear regression, 399-401 
properties, 122-123 
sums of random variables, 125—126 
Cumulative sum control charts, 571-573 



de Moivre, A., 168 
DeMorgan's laws, 59 
Dependent variable, 351 
Descriptive statistics, 1—2, 9 
Discrete random variable, 91—92 
Distribution function, 91-93 
Doll, R., 17 
Double-blind test, 164 
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Effect of column, 464 

Effect of row, 464 

Empirical rule, 32-33 

Entropy, 109-111 

Error sum of squares, 459, 465 

Estimate, 230 

Estimated regression line, 354 

Estimator, 230 

Event 

algebraic operations, 58-59 

axioms of probability, 59—61 

complement, 57 

definition, 56 

independent events, 76-80 

intersection of events, 57 

mutually exclusive events, 57 

union of events, 57 
Expectation, see Expected value 
Expected value 

calculation, 107-109 

definition, 107 

expectation of a function of a random variable, 
113-115 

nomenclature, 115 

properties, 111-113 

sums of random variables, 115—118 
Exponentially weighted moving- average control 

charts, 565-570 
Exponential random variable 

confidence interval for mean of distribution, 
260-264 

definition, 175-176 

memoryless property, 176—178 

moment generating function, 176 

Poisson process, 179—181 

properties, 176-179 



Failure rate, 239, 581 

functions, 581-584 
Finite population sampling, 217-221 
First moment, 115 
Fisher, R. A., 6 
Fisher-Irwin test, 328-329 



-F-random variable 

distribution, 191-192 

probabilities for, 615 
Frequency interpretation of probability, 55 
Frequency polygon, 10 
Frequency table, 10 



Galton, F., 6, 366 
Gamma distribution 
definition, 182 
moment generating function of gamma 

random variable, 183 
properties of gamma random variables, 

183-185 
relation between chi-square and 
gamma random variables, 
187-189 
Gamma function, 1 83 
Gauss, K. F., 5 

Generalized basic principle of counting, 63 
Goodness of fit tests, 483 

critical region determination by simulation, 

490-493 
Kolmogorov-Smirnov test, 504-508 
specified parameters, 484-489 
tests of independence in contingency tables, 
495-499 
with fixed marginal totals, 499-504 
unspecified parameters, 493-495 
Gosset, W. S., 6 
Grand mean, 456, 463 
Graphs 

bar graph, 10, 16 
frequency polygon, 10 
line graph, 10 

relative frequency graph, 10, 12 
Graunt, J., 4-5 

H 

Halley, E., 5 

Hazard rate, see Failure rate 

Herbst, A., 329 

Hill, A. B., 17 



620 



Index 



Histogram 

bimodal data set, 33-34 

definition, 16 

normal data set, 3 1 
Hotel, D. G., 21 
Hypergeometric random variable, 156—157 

binomial random variable relationship, 
159-160,219-220 

mean, 157 

variance, 157—158 
Hypothesis test, see Statistical hypothesis test 



Independent events, 76-80 
Independent random variables, 101—105 
Independent variable, 351 
Indicator random variable, 90—91 

expectation, 109 

variance, 120 
Inferential statistics, 2—3 
Information theory, entropy, 108 
Interaction of row and column in analysis of 

variance, 464 
Interval estimates, 240 

J 

Joint distribution, sample mean and sample 
variance in normal population, 
215-217 
Jointly continous random variables, 99 
Jointly distributed random variables, 95—101 
Joint probability density function, 99-100 
Joint probability mass function, 96 



Kolmogorov's law of fragmentation, 237-238 
Kolmogorov— Smirnov goodness of fit test, 

504-508 
Kolmogorov— Smirnov test statistic, 504—507 



Laplace, P., 5 

Least squares estimators in linear regression 
distribution of estimators, 355-362 



estimated regression line, 354 

mean and variance computation, 356—357 

multiple linear regression, 394-405 

normal equations, 353-354 

notation, 360 

sum of squared differences, 353 

weighted least squares, 384-390 
Left-end inclusion convention, 1 5 
Life testing 

exponential distribution 

Bayesian appproach, 596-598 
sequential testing, 590-594 
simultaneus testing, 584-590, 594 

hazard rate functions, 581-584 

maximum likelihood estimator of life 
distributions, 238-240 
Likelihood function, 230 

parameter estimation by least squares, 
602-604 

two-sample problem, 598-600 

Weibull distribution, 600-602 
Linear regression equation, 351-352 
Linear transformation, 381-384 
Line graph, 10 

Logistic regression model, 410—413 
Logistics regression function, 410 
Logistics distribution, 192-193 
Logit, 411 

Lower control limit, 547-548, 552-553, 
555-559, 562 

M 

Mann- Whitney test, 525 
Marginal probability mass function, 98 
Markov's inequality, 127-129 
Maximum likelihood estimator 

of Bernoulli parameter, 231-233 

definition, 230-231 

Kolmogorov's law of fragmentation, 237—238 

of life distributions, 238-240 

of normal population, 236-238 

of Poisson parameter, 234-235 

of uniform distribution, 238 
Mean, see Sample mean 
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Mean response 

confidence interval estimator, 372-373, 

405-408 
prediction interval of future response, 

373-375, 410 
statistical inferences, 371-372 

Mean square error, 266—271 

Median, see Sample median 

Memoryless property, 176-178 

Modal value, 22 

Mode, see Sample mode 

Mode of a density, 276-277 

Moment generating function, 126-127 
exponential random variable, 176 
gamma random variable, 183 
normal random variable, 169-170, 173 
Poisson random variable, 149-150, 154 

Monte Carlo simulation, 251 

Moving-average control charts, 563-565 

Multiple linear regression, 394-405 

Multiple regression equation, 352 

Multivariate normal disttribution, 398 

N 

Negatively correlated, 36 
Newton, I., 5 
Neyman, J., 7 
Nonparametric hypothesis tests 

definition, 515 

rank sum test 

classical approximation, 529—531 
null hypothesis, 526 
simulation, 531-533 
T statistic, 525-529 

runs test for randomness, 533—536 

signed rank test, 519-525 

sign test, 515-519 
Nonparametric interference problem, 202 
Normal data set 

approximately normal data set, 3 1 

definition, 31 

empirical rule, 32—33 

histogram, 31 
Normal density, 275 
Normal distribution, 168-170 



Normal equations, 395 
Normal prior distribution, 275-277 
Normal random variable 
definition, 168 

moment generating function, 169-170, 173 
probability calculations, 174-175 
summation, 173 
Null hypothesis, 292 
analysis of variance 

one-way, 442, 445-446 
two-way, 464, 467-468 
Bernoulli populations, 323-330 
equality of normal variances, 321-323 
equality of two normal means 
known variances, 312-314 
paired Mest, 319-320 
unknown and unequal variances, 318 
unknown variances, 314—318 
goodness of fit tests, 484-485, 494, 496, 502, 

504 
normal population mean with known 

variance, 293-305 
one-sided tests, 300-305 
Poisson distribution mean, 330-333 
rank sum test, 526 
regression parameter b, 363-365 
signed rank test, 521 
sign test, 515, 517, 519 



Observational study, 329 

Ogive, 16 

One-way analysis of variance, see Analysis of 

variance 
Operating characteristic curve, 297 
Order statistics, 586 
Out of control process, 545 
Overlook probabilities, 76 



Paired data sets, 33-36 

Paired West, 319-320 

Parametric interference problem, 201-202 

Pearson, E., 7 

Pearson, K., 6, 367, 490 
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Permutation, 63 

Pie chart, 12 

Point estimator, 240-241 

evaluation, 266—272 

mean square error, 266—271 

unbiased estimator, 267 
Poisson, S. D., 148 

Poisson distribution function, 155-156 
Poisson process, 179-181 
Poisson random variable 

applications, 150-153 

definition, 148 

moment generating function, 149-150, 154 

square root, 389-390 

tests concerning Poisson distribution mean, 
330-333 
Polynomial regression, 391—394 
Pooled estimator, 259, 315 
Population, 3, 201 
Population mean, 202 
Population variance, 202 
Positively correlated, 36 
Poskanzer, D., 329 
Posterior density function, 273, 276 
Power function, 298 
Prediction interval, future response in regression, 

373-375, 410 
Prior distribution, 272, 275-277 
Probability 

axioms, 59—61 

conditional, 67-70 

frequency interpretation, 55 

subjective interpretation, 55 
Probability density function 

joint probability density function, 99-100 

random variable, 93-95 

sample means, 203 
Probability mass function 

binomial random variable, 142 

joint probability mass function, 96 

marginal probability mass function, 98 

random variable, 92 
Probit model, 412 
Pseudo random number, 251 
/.-value, 296, 303-304, 309, 311 



Quadratic regression equation, 393 
Quality control, see Control charts 
Quartiles, 25-27 



Randomness, runs test, 533—536 
Random number, 163 
Random sample, 217 
Random variable 

Bernoulli random variable, 141 
binomial random variable, 141-148 
chi-square distribution, 185-187 
conditional distributions, 105-107 
continuous random variable, 91, 93 
covariance 

definition, 121-122 

properties, 122-123 

sums of random variables, 125-126 
definition, 89-90 
discrete random variable, 91-92 
distribution function, 91-93 
entropy, 109-111 
expectation, 107-118 
exponential random variables, 175—181 
F-distribution, 191-192 
gamma distribution, 182-185 
hypergeometric random variable, 156—160 
independent random variables, 101—105 
indicator random variable, 90-91 
jointly distributed random variables, 95-101 
logistics distribution, 192-193 
moment generating functions, 126—127 
normal random variables, 168-175 
Poisson random variable, 148—156 
probability density function, 93-95 
probability mass function, 92 
sums of random variables, expected value, 

115-118 
^-distribution, 189-191 
uniform random variable, 160—168 
variance 

definition, 118-120 

standard deviation, 121, 126 

sums of random variables, 123-125 
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Range of data, 27 
Rank sum test 

classical apptoximation, 529-531 

null hypothesis, 526 

simulation, 531-533 

^statistic, 525-529 
Rayleigh density function, 583 
Regression coefficients 

definition, 352 

statistical inferences concerning regression 
parameters 

a, 370 

b, 362-370 

mean response, 371-373 
prediction intervals of future response, 
373-375 
Regression fallacy, 370 
Regression to the mean, 366-370 
Relative frequency, 10, 12, 15 
Residuals in regression, 358 
model assessment, 378-380 
standardized residuals, 379 
sum of squares 

chi-square distribution, 358-359 
computational identity, 360-362 
multiple linear regression, 397, 402-403 
Robust test, 305 
Row sum of squares, 460-461 
Runs test, 533-536 



Sample, 3, 201-202 

Sample correlation coefficient 

coefficient of determination relationship, 378 

definition, 36 

positive versus negative correlations, 36 

properties, 37-40 
Sample mean 

analysis of variance for multiple comparisons 
of sample means, 450—452 

approximate distribution, 2 1 0-2 1 2 

definition, 17, 19, 202-203 

distribution from a normal population, 215 



joint distribution with sample variance, 
215-217 

probability density function, 203 
Sample median, 20—21 
Sample mode, 21 
Sample percentile 

definition, 25 

quartiles, 25-27 
Sample space 

definition, 56 

spaces having equally likely outcomes, 61—67 
Sample standard deviation, 24, 213 
Sample variance 

algebraic identity for computation, 23—24 

definition, 22-23, 213-214 

joint distribution with sample mean, 215—217 
Scatter diagram, 34, 352 
S-control charts, 554 
Sequence of interarrival times, 181 
Shockley equation, 162 
Signed rank test, 519-525 
Significance level, 293, 306, 309 
Sign test, 515-519 
Simple hypothesis, 292 
Simple regression equation, 352 
Skewed data set, 3 1 
Standard deviation, see Sample standard 

deviation; Variance 
Standard normal distribution function, 170—171, 

175,612 
Standardized residuals, 379 
Statistic, 202 
Statistical hypothesis test 

Bernoulli populations, 323-330 

composite hypothesis, 292 

definition, 291 

equality of normal variances, 321-323 

equality of two normal means 
known variances, 312—314 
paired Mest, 319-320 
unknown and unequal variances, 318 
unknown variances, 314—318 

level of significance, 293 

normal population mean with known 
variance, 293-305 
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Statistical hypothesis test (continued ) 

null hypothesis, 292 

one-sided tests, 300-305 

Poisson distribution mean, 330-333 

power function, 298 

/-value, 296, 303-304 

regression parameter b, 363—365 

robustness, 305 

simple hypothesis, 292 

t-test, 305-311 
Statistics 

definition, 1, 6 

descriptive, 1-2, 9 

historical perspective, 3-7 

inferential, 2-3 

summarizing, 17 
Stem and leaf plot, 16-17 
Subjective interpretation of probability, 55 
Sum of squares identity, 447-450 
Survival rate, 239-240 



Total time-on test statistic, 586 
f-random variable 

distribution, 189-191 

probabilities for, 614 
Tree diagram, 1 66 
T statistic, 306-307, 310, 368, 445-446, 

484-485, 489, 525 
f-test, 305-306 

level of significance, 306, 309 



/-value, 307-310 

two-sided tests, 307-311 
Two-factor analysis of variance, see Analysis of 

variance 
Type I error, 292 
Type II error, 292 

u 

Ulfelder, H., 329 

Unbiased estimator, 267, 271, 357-358, 398 
Uniform distribution, 166-168 
Uniform random variable, 160—168 
Unit normal distribution, 170 
Upper control limit, 547-548, 552-553, 
555-559, 562 



Variance, see also Sample variance 
definition, 118-120 
standard deviation, 121, 126 
sums of random variables, 123—125 

Venn diagram, 58 

w 

Weak law of large numbers, 129-130 
Weibull distribution, 600-602 
Weighted average, 19 
Weighted least squares, 384-390 
Wilcoxon test, 525 

Within samples sum of squares, 443—445, 
452-453 



